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R.  B.  Lehoucq 


Abstract 

The  Arnoldi  algorithm,  or  iteration,  is  a  computationally  attractive  technique  for 
computing  a  few  eigenvalues  and  associated  invariant  subspace  of  large,  often  sparse, 
matrices.  The  method  is  a  generalization  of  the  Lanczos  process  and  reduces  to 
that  when  the  underlying  matrix  is  symmetric.  This  thesis  presents  an  analysis  of 
Sorensen’s  Implicitly  Re-started  Arnoldi  iteration,  (iRA-iteration),  by  exploiting  its 
relationship  with  the  QR  algorithm.  The  goal  of  this  thesis  is  to  present  numerical 
techniques  that  attempt  to  make  the  IRA-iteration  as  robust  as  the  implicitly  shifted 
QR  algorithm.  The  benefit  is  that  the  Arnoldi  iteration  only  requires  the  computation 
of  matrix  vector  products  w  —  Av  at  each  step.  It  does  not  rely  on  the  dense  matrix 
similarity  transformations  required  by  the  EISPACK  and  LAPACK  software  packages. 

Five  topics  form  the  contribution  of  this  dissertation.  The  first  topic  analyzes 
re-starting  the  Arnoldi  iteration  in  an  implicit  or  explicit  manner.  The  second  topic 
is  the  numerical  stability  of  an  IRA-iteration.  The  forward  instability  of  the  QR 
algorithm  and  the  various  schemes  used  to  re-order  the  Schur  form  of  a  matrix  are 
fundamental  to  this  analysis.  A  sensitivity  analysis  of  the  Hessenberg  decomposition  is 
presented.  The  practical  issues  associated  with  maintaining  numerical  orthogonality 
among  the  Arnoldi/Lanczos  basis  vectors  is  the  third  topic.  The  fourth  topic  is 
deflation  techniques  for  an  IRA-iteration.  The  deflation  strategies  introduced  make 
it  possible  to  compute  multiple  or  clustered  eigenvalues  with  a  single  vector  re-start 
method.  The  block  Arnoldi/Lanczos  methods  commonly  used  are  not  required.  The 
final  topic  is  the  convergence  typical  of  an  IRA-iteration.  Both  formal  theory  and 
heuristics  are  provided  for  making  choices  that  will  lead  to  improved  convergence  of 
an  IRA-iteration. 
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Chapter  1 
Introduction 


Many  scientific  and  engineering  problems  lead  to  the  matrix  eigenvalue  problem 
(1.0.1)  Ax  =  XBx, 

where  A  and  B  are  real  matrices  of  order  n.  The  matrix  B,  when  it  arises,  is  usually 
symmetric  positive  semi-definite.  In  many  situations  B  —  /,  the  identity  matrix,  and 
this  is  the  case  assumed  unless  stated  otherwise.  For  the  remainder  of  the  thesis, 
we  suppose  that  A  is  nonsymmetric  and  real  with  standard  simplifications  when  the 
matrix  is  symmetric. 

This  thesis  examines  a  promising  variant  of  Arnoldi’s  method  [3]  for  computing 
approximations  to  a  few  eigenpairs  ( x ,  A)  of  A.  The  Arnoldi  method  is  an  efficient 
procedure  for  approximating  a.  subset  of  the  eigensystem  for  a  large,  often  sparse, 
matrix  A.  The  method  is  a  generalization  of  the  Lanczos  process  [46]  and  reduces 
to  that  when  A  is  symmetric.  The  process,  sequential  in  nature,  produces  an  upper 
Hessenberg  matrix  Hk  of  order  k  at  the  A.-th  step.  The  eigenvalues  of  Hi-  are  used 
to  approximate  a  few  of  the  eigenvalues  of  A.  Excellent  approximations  to  some  of 
the  eigenvalues  often  appear  for  values  of  k  significantly  smaller  than  the  order  of 
the  matrix.  The  iteration  only  requires  the  computation  of  a  matrix  vector  product 
iv  =  Av  at  each  step.  It  does  not  rely  on  the  dense  matrix  similarity  transformations 
required  by  EISPACK  [82]  and  LAPACK  [1], 

There  are  a  number  of  numerical  difficulties  with  Arnoldi /Lanczos  methods.  These 
include: 

•  Maintaining  the  orthogonality  of  the  Arnoldi/Lanczos  basis  vectors. 

•  Reducing  the  storage  requirements  of  the  methods. 

•  The  computation  of  multiple  and  clustered  eigenvalues  of  A. 

•  Convergence  to  a  selected  group  of  eigenvalues  of  A. 
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•  Handling  spurious  eigenvalues  when  orthogonality  is  not  enforced. 

Each  of  these  issues  is  considered  in  detail  during  the  course  of  the  thesis.  Over 
a  decade  of  research  has  been  devoted  to  understanding  and  overcoming  the  nu¬ 
merical  difficulties  of  the  Lanczos  method.  The  works  of  Parlett  [61],  Cullum  and 
Wiloughby  [21]  study  in  detail  the  many  specifics  of  the  Lanczos  algorithm,  while  the 
paper  by  Grimes,  Lewis  and  Simon  [39]  discusses  the  design  and  development  of  high 
quality  software. 

Development  of  the  Arnoldi  method  lagged  behind  due  to  the  inordinate  compu¬ 
tational  and  storage  requirements  associated  with  the  original  method  when  a  large 
number  of  steps  are  required  for  convergence.  The  explicitly  re-started  Arnoldi  it¬ 
eration  (ERA-iteration)  was  introduced  by  Saatl  [74]  to  overcome  these  difficulties, 
based  on  similar  ideas  developed  for  the  Lanczos  process  by  Paige  [57],  Cullum  and 
Donath  [20],  and  Golub  and  Underwood  [37].  Karush  [44]  proposes  what  appears  to 
be  the  first  example  of  a  re-started  iteration. 

A  relatively  recent  variant  was  developed  by  Sorensen  [83]  as  a.  more  efficient 
and  numerically  stable  way  to  implement  restarting.  This  technique,  the  Implicitly 
Restarted  Arnoldi  iteration  (iRA-iteration),  may  be  viewed  as  a  truncation  of  the 
standard  implicitly  shifted  QR-iteration.  This  thesis  presents  an  analysis  of  an  IR  A- 
iteration  that  exploits  its  relationship  with  the  implicitly  shifted  QR  algorithm.  This 
viewpoint  provides  an  alternate  approach  to  study  the  Arnoldi/Lanczos  iterations  in 
which  the  power  of  the  QR  algorithm  is  utilized.  The  immediate  impact  is  the  im¬ 
provement  of  the  numerical  accuracy  and  convergence  properties  of  the  ARPACK  [49] 
software  package. 

The  goal  of  this  thesis  is  to  present  numerical  techniques  that  are  designed  to  make 
the  IRA-iteration  as  robust  as  the  implicitly  shifted  QR  algorithm  for  dense  problems. 
These  schemes  are  analyzed  with  respect  to  numerical  stability  and  computational 
results  are  presented. 

1.1  Organization  of  the  Thesis 

The  dissertation  is  organized  as  follows.  Chapter  2  introduces  Arnoldi ’s  method  as 
well  as  a  few  of  the  many  associated  fundamentals.  The  QR  algorithm  is  the  subject 
of  Chapter  3.  A  connection  between  the  Arnoldi  method  and  the  implicitly  shifted 
QR-iteration  is  established  that  is  exploited  for  the  remainder  of  the  thesis.  The  idea 
of  re-starting  an  Arnoldi  iteration  is  examined  in  Chapter  4.  The  IRA-iteration  is 
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introduced  and  a  comparison  between  implicitly  and  explicitly  re-starting  an  Arnoldi 
iteration  is  drawn.  Chapter  5  examines  the  numerical  stability  of  the  IRA-iteration  by 
considering  the  stability  of  a  Hessenberg  decomposition.  Connections  are  made  with 
the  concept  of  the  forward  instability  of  the  QR  algorithm,  re-orthogonalization  meth¬ 
ods  and  the  various  methods  used  to  re-order  the  Schur  form  of  a  matrix.  Deflation 
techniques  for  an  IRA-iteration  are  treated  in  Chapter  6.  A  numerically  stable  scheme 
is  introduced  that  implicitly  deflates  the  converged  approximations  from  the  itera¬ 
tion.  Two  forms  of  implicit  deflation  are  presented.  Convergence  of  the  iteration 
is  improved  and  a  reduction  in  computational  effort  is  also  achieved.  The  deflation 
strategies  make  it  possible  to  compute  multiple  or  clustered  eigenvalues  with  a  single 
vector  restart  method.  A  block  method  is  not  required.  Maintaining  orthogonality  of 
the  Arnoldi  basis  vectors  is  considered  in  Chapter  7.  The  convergence  typical  of  an 
IRA-iteration  is  the  subject  of  Chapter  8.  Both  formal  theory  and  heuristics  are  pro¬ 
vided  for  making  choices  that  will  lead  to  improved  convergence  of  an  IR  A-iteration. 
Chapter  9  summarizes  the  dissertation  and  examines  future  work. 

1.2  Notation  and  Fundamentals  of  Matrix  Computations 

We  shall  now  establish  the  notation  to  be  used  during  the  course  of  this  thesis. 
It  is  also  necessary  to  review  a  number  of  details  on  the  matrix  factorizations  and 
techniques  that  will  be  used. 

We  employ  Householder  notational  conventions.  Capital  and  lower  case  letters  de¬ 
note  matrices  and  vectors,  respectively,  while  lower  case  Greek  letters  denote  scalars. 
The  identity  matrix  in  R"xn  is  denoted  by  In  and  the  subscript  is  dropped  when  the 
context  is  clear.  The  jr-th  canonical  basis  vector  is  denoted  by  tj,  the  j-th  column  of 
the  identity  matrix.  The  transpose  of  a  vector  x  is  denoted  by  xr  and  x"  denotes  the 
complex  conjugate  of  xT .  The  norms  used  are  the  Euclidean  and  Frobenius  denoted 
by  ||  ■  ||  and  ||  •  ||j?,  respectively.  The  range  of  a  matrix  A  is  denoted  by  11(A). 

1.2.1  The  Real  Schur  Form 

Since  we  are  especially  concerned  with  algorithms  that  result  in  robust  and  efficient 
software,  the  following  decomposition  is  a  special  case  of  the  more  general  Schur 
decomposition.  The  special  case  allows  us  to  compute  strictly  in  real  arithmetic.  The 
proper  resolution  of  complex  conjugate  pairs  of  eigenvalues  comes  from  noting  that 
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if  A(x  +  iz )  =  {y  +  z/z)(.t  +  zz)  where  x  and  z  are  vectors  in  Rn  with  /i  ^  0,  then 

'  "  1  =  [  ,  ,  ]  D» 

—  ft  V  1 

The  following  decomposition  proves  central  to  the  eigenvalue  algorithms  considered 
in  this  thesis. 


(1-2-1) 


X 


Theorem  1.1  (Real  Schur  Decomposition)  If  A  £  RnXn  then  there 
exists  an  orthogonal  Q  €  RnXn  such  that 

Rim 

R'‘m  =  R, 

0  0  •••  R„„„ 


(1.2.2) 


CfAQ  = 


Rw  R\2 

0  R-22 


where  each  Ra  is  a  square  block  of  order  one  or  two.  The  blocks  of  order 
two  contain  the  complex  conjugate  eigenvalues  of  A.  The  matrix  R,  is  said 
to  be  in  upper  quasi-triangular  matrix  form. 


Proof  See  [35,  page  362].  □ 

Let  C  be  a  quasi-diagonal  orthonormal  matrix  with  two  by  two  blocks  allowed 
only  where  R  has  them.  Then  ( QC)T AQC  =  CT RC  has  diagonal  blocks  that  are 
similar  to  those  of  R.  Thus,  apart  from  the  eigenvalues  of  multiplicity  larger  than 
one,  the  decomposition  is  essentially  unique  given  some  ordering  of  the  eigenvalues. 
Denote  the  leading  principal  matrix  of  k  blocks  of  R  by  Rk  where  no  i?„  is  split.  Let 
Qk  €  Rr‘x*  be  the  corresponding  columns  of  Q.  Then  AQk  =  QkRk  is  a  partial  real 
Schur  decomposition  of  A  of  order  k.  The  algorithms  of  this  thesis  a-ttempt  to  compute 
a  partial  Schur  decomposition  for  A  with  a  group  of  wanted  eigenvalues  located  on 
the  diagonal  blocks  of  Rk-  The  k  <C  «  eigenvalues  of  A  requiring  approximation  are 
typically  contained  within  some  convex  set  of  interest  in  the  complex  plane.  Examples 
include  those  nearest  the  origin,  and  of  largest  real  part.  An  important  exception 
might  be  the  dominant  eigenvalues  of  A,  those  largest  in  magnitude. 

A  quasi-diagonal  form  for  A  exists  if  there  is  a  nonsingular  matrix  X  G  RnXn  such 
that  AX  =■  XL)  where  D  is  a  block  diagonal  matrix  with  each  block  of  order  one 
or  two.  The  blocks  of  order  two  contain  the  complex  conjugate  pair  of  eigenvalues 
as  in  equation  (1.2.1)  with  ft  positive.  The  columns  of  X  span  the  right  eigenspace 
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corresponding  to  diagonal  values  of  D.  For  the  blocks  of  order  two  on  the  diagonal 
of  D  the  corresponding  complex  eigenvector  is  stored  in  two  consecutive  columns 
of  X,  the  first  holding  the  real  part,  and  the  second  the  imaginary  part.  We  also 
assume  that  the  columns  of  X  are  unit  vectors.  If  we  assume  that  A  is  diagonal- 
izable,  the  matrix  B  may  be  further  decomposed  [1,  35,  86]  as  B.S  =  SD  where 
D  =  diag(f?n,  B'22  i  •  •  • ,  B.mm )  and  S  £  RnX,t  is  upper  quasi-tri angular  and  nonsingu¬ 
lar.  The  matrix  pair  (QS,  D)  represents  a  quasi- diagonal  form  for  A. 

1.2.2  Elementary  Orthonormal  Matrices 

A  real  matrix  U  £  RnXn  is  orthonormal  if  UTIJ  =  /„.  The  matrix  consisting  of  any  of 
the  columns  of  U  is  called  an  orthogonal  matrix.  For  example,  define  U[e i, . . . ,  ek]  = 
uk  e  Rnxfc,  and  note  that  UkUk  =  Ik  but  UkUk  7^  In  unless  k  =  n.  Hence,  Uk  is 
orthogonal  for  all  values  of  k  but  only  orthonormal  when  k  =  n. 

Givens  rotations  and  Householder  reflectors  are  two  important  classes  of  simple 
orthonormal  matrices  that  will  be  used  extensively  in  this  thesis.  We  briefly  intro¬ 
duce  their  fundamentals  and  refer  the  reader  to  the  sources  [47,  61,  101]  for  more 
comprehensive  treatments  including  their  numerically  stable  implementation. 

A  Householder  reflector  is  a  matrix  of  the  form  W  —  I  —  twwt  where  r  = 
2(wtw)~1  if  w  jz  0.  Direct  computation  yields  that  W  is  orthogonal  and  symmetric 
and  hence  W 1  =  I.  If  we  choose  the  vector  w  —  .T±||x||ei  the  Householder  matrix  W  is 
such  that  Wx  =  =p  jl-'1-' II ei  f°r  x  €  Rn.  Since  W  is  orthogonal  and  symmetric  it  follows 
that  its  first  column  (and  row)  contains  ±x/||.t||.  The  geometrical  interpretation  of 
the  transformation  effected  by  W  is  that  it  acts  as  a  reflection  in  the  subspace  of 
dimension  n  —  1  orthogonal  to  w. 

A  Givens  rotation  £  R71  x  11  acts  a.s  a  rotation  in  the  plane  spanned  by  e,  and 
er  The  rotation  differs  from  /,,  only  in  the  (st,  *),(j,  J ),(®,  J )  and  (j,i)  entries  of  Gij : 

a  7 
—7  a 

An  example  that  illustrates  their  use  is  to  determine  scalars  7  and  a  so  that 
the  first  column  of  G\ ^  is  equal  to  ±j;/||x||  for  x  £  R2.  Equivalently,  we  solve 

=  zb 1 1 .x- 1 1 e  1  and  a  simple  derivation  shows  that  a  =  —  G/IMI  ancl  7  = 

£2/||®]|  give  the  required  result.  Thus,  the  rotation  acts  as  a  matrix  transformation 
that  rotates  R2  through  a  counterclockwise  angle  where  tan  r/>  =  —  £2/£i- 


Gl2 


G 
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ei  Gijti  ei  G ije.j 


G  Gijei 


eJGuej 
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If  x  €  R”  then  we  may  compute  a  sequence  of  Givens  rotations  so  that 


3  •  -  G[<n-\x  ~  =t||-Tlle'l- 


Note,  that  unlike  the  corresponding  Householder  reflector  accomplishing  the  same 
task,  the  product  G\tU-\  ■  ■  ■  6*1,2  is  not  symmetric.  However,  since  each  Givens  rota¬ 
tion  is  orthogonal,  the  product  of  them  is  also,  which  is  the  important  property. 

Returning  to  the  previous  example  of  constructing  a  Givens  rotation  so  that 
Gi2x  —  i||*Tllei  where  x  G  R2,  allows  us  to  determine  a  relationship  with  the 
Householder  reflector  accomplishing  the  same  task.  The  relationship  is 


( 7  7 

—7  <7 


thus  expressing  a  Givens  rotation  as  a  product  of  two  reflectors. 


1.2.3  The  QR  Factorization  of  Matrix 

Given  a  matrix  B  G  RmXn,  it  will  prove  useful  to  be  able  to  factor  B  into  a  product 
of  an  orthonormal  and  upper  triangular  matrix,  respectively.  Such  a  factorization 
allows  an  orthogonal  represention  of  B' s  column  space. 

Theorem  1.2  Suppose  that  B  G  RmXM  where  m,  the  number  of  rows, 
is  at  least  as  large  as  n,  the  number  of  columns.  If  /  =  rank ( B )  then  there 
exist  an  unique  orthogonal  matrix  Q\  €  Rmx;  and  an  unique  nonsingular 
upper  triangular  matrix  R.\  G  R xl  with  positive  diagonal  elements  such 
that 

(1.2.3)  B  =  QR=[Q1  Ch  ]  ^  , 

where  Q  G  RmXm  is  an  orthonormal  matrix. 

Proof  See  Golub  and  Van  Loan  [35,  pages  212,  214]  for  algorithmic  derivations 
using  either  Givens  rotations  or  Householder  reflectors.  □ 

It  follows  that  71(B)  =  IZ(Qi)  thus  providing  an  orthogonal  basis  for  the  columns 
space  of  B.  The  unique  factorization  B  =  Q\R\  results  if  rank(  B)  =  n  and  amounts 
to  performing  the  Gram-Schmidt  process  to  the  columns  of  B.  It  is  interesting  to 
note  that  regardless  of  whether  Givens  rotations  or  Householder  reflectors  are  used  to 
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compute  the  QR  factorization  of  B,  both  implementations  result  in  the  same  Q i  and 
Rx .  However,  the  other  orthogonal  matrix,  Q 2,  and  consequently  R\2  are  not  uniquely 
defined.  A  word  of  caution:  most  algorithms  computing  the  QR,  factorization  of  a 
matrix  are  only  unique  up  to  a.  scaling  of  the  columns  of  Q\  and  the  corresponding 
rows  of  R,\  by  a  factor  of  ±1.  The  reason  is  that  we  may  always  compute  a  diagonal 
matrix  D  €  RnXn  consisting  of  only  ±1  and  so  that  B  =  QDD~lR.  is  another 
orthogonal  factorization  of  B. 
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Chapter  2 


The  Arnoldi  Method 


Arnoldi’s  method  [3]  is  an  orthogonal  projection  method  for  approximating  a  subset 
of  the  eigensystem  of  a  general  square  matrix.  The  method  builds,  step  by  step,  an 
orthogonal  basis  for  the  Krylov  subspace, 

£m{A,  vi)  =  Spanftq,  Avx, . . . ,  Am~1v x}, 

for  A  generated  by  the  vector  v\.  The  original  algorithm  proposed  was  designed  to 
compute  the  Hessenberg  decomposition 

UtAU  =  H  UTIJ  =  /, 

where  H  is  an  upper  Hessenberg  matrix.  As  this  chapter  demonstrates,  there  is  an 
intimate  connection  between  Krylov  subspaces  and  Hessenberg  matrices. 

The  chapter  is  organized  as  follows.  Some  useful  results  concerning  Hessenberg 
matrices  are  presented  in  §  2.1.  The  Arnoldi  factorization  is  introduced  in  §  2.2. 
The  Hessenberg  decomposition  of  A  using  other  orthogonal  reduction  methods  is 
reviewed  in  §  2.3.  Truncated  Arnoldi  factorizations  which  lead  to  real  partial  Schur 
decompositions  are  treated  in  §  2.4.  Determining  how  well  an  eigenvalue  of  the 
projected  matrix  Hm  approximates  an  eigenvalue  of  A  is  considered  in  §  2.5.  The 
convergence  properties  of  Krylov  subspaces  are  briefly  reviewed  in  §  2.6. 

2.1  Fundamentals  of  Hessenberg  Matrices 

Hessenberg  matrices  hold  a  fundamental  role  lor  the  analysis  presented  in  this  thesis. 
This  section  reviews  many  of  their  most  important  properties. 

We  choose  to  label  the  i-th  diagonal  and  sub-diagonal  elements  of  H  £  R,nXn,  an 
Hessenberg  matrix,  as  a,-  and  /?I+1,  respectively: 

«i  •  •  • 

P  2  «2 


o  0  pn  an 
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A  Hessenberg  matrix  is  said  to  be  unreduced  if  all  of  its  sub-diagonal  elements  are 
nonzero.  Both  the  left  and  right  eigenvectors  of  unreduced  upper  Hessenberg  matrices 
possess  the  following  curious  properties. 

Lemma  2.1  Suppose  that  H  €  Rnx"  is  an  unreduced  upper  Hessenberg 
matrix.  If  Hs  =  s6  with  .s  7^  0  and  HT  u.  =  uO  with  u  7^  0  then  ejts  7^  0 
and  e-fu  7^  0. 

Proof  The  proof  is  by  induction  on  the  order  of  H.  Suppose  that  H2  is  an  unreduced 
matrix  of  order  of  order  two.  The  last  row  results  of  tin5  equation  II 2 s  =  sO  is 

(-2  H'i$  —  ft2(Tl  +  Cl2CT2  =  ^a2i 

where  ef  s  =  cq.  If  a2  =  0  then  fi2al  —  0.  Since  H2  is  unreduced,  then  ay  =  0  which 
is  a  contradiction  since  by  definition  eigenvectors  are  non  zero. 

Assume  the  lemma’s  truth  for  matrices  of  order  n  —  1.  Let  Hn  €  RMXn  be  an 
unreduced  Hessenberg  matrix  and  partition  the  equation  Hns  =  sO  as 


1 

- 

r  1 

Hii—i  hn 

*n-l 

•s«-l 

1 - 

c* 

1— i 

p 

1 _ 

1 

p 

r 

■ 

where  Hn-i  G  Rn-lxn-1  and  .s„_j  £  Rn_1.  Note  that  1  is  unreduced  since  Hn  is. 
Suppose  cjts  =  a n  =  0  which  implies  that  flnejl_lsn-i  =  0  and  Hn-iSn-i  =  sn-\0.  By 
the  induction  hypothesis  e^_rs„_i  7^  0  and  hence  fin  =  0  which  is  a  contradiction. 

The  proof  for  the  result  that  cju  7^  0  where  HT u  =  uO  also  follows  from  a  similar 
proof  by  mathematical  induction.  □ 

Unreduced  Hessenberg  matrices  have  rank  at  least  n  —  1  since  the  first  n  —  1 
columns  are  linearly  independent.  Thus  the  null  space  of  H  —  fil  is  of  dimension  one 
if  fi,  is  an  eigenvalue  of  H  and  zero  otherwise.  If  the  invariant,  subspace  associated 
with  an  eigenvalue  is  of  dimension  greater  than  one,  then  the  corresponding  matrix  is 
derogatory  otherwise  the  matrix  is  nonde.rogat.07gj.  It  follows  then  that  a  symmetric 
unreduced  tridiagonal  matrix  cannot  have  a  repeated  eigenvalue  since  a  repeated 
eigenvalue  would  imply  that  the  eigenvectors  of  the  symmetric  matrix  would  not 
span  Rn.  The  previous  discussion  is  summarized  by  the  following  result. 

Lemma  2.2  An  unreduced  Hessenberg  matrix  is  nonderogatory.  In  par¬ 
ticular,  if  H  is  a  symmetric  matrix  all  its  eigenvalues  are  distinct. 
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It  follows  that  an  unreduced  nonsymmetric  Hessenberg  matrix  is  likely  to  have  an 
ill  conditioned  basis  of  eigenvectors  when  it  has  nearly  equal  eigenvalues.  When  there 
is  a  repeated  eigenvalue  the  lemma  implies  that  H  €  R“Xn  has  less  than  n  linearly 
independent  eigenvectors.  If  the  eigenvectors  of  a  matrix  of  order  «  are  not  a  basis 
for  Rn  then  the  matrix  is  called  defective.  Hence,  if  H  has  a  repeated  eigenvalue  it 
is  a  defective  ma,trix. 

Unreduced  Hessenberg  matrices  reveal  much  information  about  the  underlying 
eigen-system.  Ericsson  [29]  and  Parlett  [59,  61]  provide  an  abundance  of  results  for 
Hessenberg  matrices. 

2.2  The  Arnoldi  Factorization 

After  k  steps,  the  Arnoldi  method  computes 

(2.2.1)  AVk  =  VkHk  +  heTk, 

where  V^Vk  =  h  and  Hk  €  JAkxk  is  an  upper  Hessenberg  matrix.  The  vector  j\  is 
the  residual  and  is  orthogonal  to  the  columns  of  14,  the  Arnoldi  vectors.  The  matrix 
Hk  =  14T  AVk  is  the  orthogonal  projection  of  A  onto  the  Range  of  14.  Equation  (2.2.1) 
defines  a  length  k  Arnoldi  factorization  of  A.  If  the  residual  fk  is  the  zero  vector  then 
equation  (2.2.1)  is  called  a  truncated  Arnoldi  factorization  when  k  <  n.  Note  that  /„ 
must  vanish  since  Vrf  fn  =  0  and  the  columns  of  14  form  an  orthogonal  basis  for  Rn. 
In  this  case  the  Arnoldi  method  computes  an  Hessenberg  decomposition. 

The  following  classical  result  explains  that  the  Arnoldi  factorization  is  completely 
specified  by  v\. 

Theorem  2.1  (Implicit  Q)  Let  two  length  k  Arnoldi  factorizations  be 
given  by 


AU,  =  14  Hk  +  fktl, 

AUk  =  UkGk  +  rkel , 

where  Uk  and  14  have  orthonormal  columns,  and  Gk  and  Hk  are  upper 
Hessenberg  matrices  with  positive  sub-diagonal  elements.  If  the  first  col¬ 
umn  of  14  and  Uk  are  equal  then  Gk  =  Hk,  Uk  =  14,  and  rk  =  fk. 
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Proof  See  Golub  and  Van  Loan  [35,  page  367].  D 

The  essential  hypothesis  is  that  Hk  is  unreduced.  We  note  that  it  Hk  has  any 
negative  sub-diagonal  elements,  a  diagonal  matrix  Dk  consisting  of  ±1  is  easily  com¬ 
puted  so  that  D^HkDk  has  positive  sub-diagonal  elements.  Equation  (2.2.1)  may 
then  be  updated  to  obtain  another  Arnoldi  factorization 

AVkDk  =  VkDk{D?HkDk)  +  8kfkel, 

where  8k  =  ekDkek.  The  direction  of  vx  is  the  important  consideration. 

The  following  algorithm  shows  how  the  factorization  is  extended  from  length  k  to 
k  +  p. 


Algorithm  2.2 

function  [Vk+p,Hk+p,  fk+r]  =  Arnoldi  (A,  Vk,  Hk,  /it,  k, p) 
Input:  A Vk  -  VkHk  =  fke%  with  VkVk  =  Ik,  Vk  fk  -  0. 

Output:  AVk+P  -  Vk+pHk+p  =  fk+t>ek+p  with  Vjf+pVk+p  =  h+7 
and  Vjf+pfk+p  =  0. 


1.  For  j  =  1,2..  .p 

2.  Pk+j  *-  ]|/a.-+.7-i  II;  if  Pk+j  =  0  then  stop; 
d.  vk+j  ik+j—iftk+ji  W'+i  *  [Ffc+j— i , 

4.  w  <—  Avk+j ; 

5.  hk+1  <-  Vk+j-iW]  m+j  <-  vk+Jw  ; 

6. 


H-k+j 


Hk+j-1  hk+j 

Pk+j  ek+j- 1  ak+j 


7.  fk+j  *  U:  Lfc+j  —  X  h’k+j  Vk+j&k+j  i 


A  few  remarks  are  in  order. 

1.  If  A  is  symmetric,  then  Hk  is  a  symmetric  tridiagonal  matrix  so  that  hk+J  = 
Pk+j^k+j-i  and  hence  a  three  term  recurrence  may  be  used  to  compute  fk+j- 

2.  If  k  —  0,  then  V\  =  ui  represents  the  initial  vector. 
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3.  In  order  to  ensure  that  the  A:-tli  residual  is  numerically  orthogonal  to  the  matrix 
of  Arnoldi  vectors  14  in  finite  precision  arithmetic,  procedure  Arnoldi  requires 
some  form  of  re-orthogonalization  at  Line  7.  This  is  the  subject  of  Chapter  7. 
Mathematically,  the  residual  computed  at  Line  7  represents  the  projection 
of  w  —  Ai>k+j  onto  the  orthogonal  compliment  of  K,k+j(A,v i). 

4.  In  exact  arithmetic,  the  algorithm  halts  only  if  a,  residual  vector  vanishes,  i.e. 
/■  =  0.  The  implications  of  a  truncated  Arnoldi  factorization  are  discussed  in 
§2.4. 

5.  If  fj  =  0  for  j  <  n ,  then  the  factorization  may  be  modified  to  extend  the 
truncated  factorization  by  using  any  unit  vector  orthogonal  to  the  columns  of 
Vj .  The  unit  vector  becomes  the  j  +  1-st  Arnoldi  vector  and  the  j-th  sub¬ 
diagonal  element,  /?,+i,  is  zero.  Although  VjAVn  =  Hn  is  upper  Hessenberg,  it 
is  not  unreduced. 

2.3  Orthogonal  Reductions  to  Hessenberg  Form 

If  Algorithm  2.2  is  used  to  compute  a  length  n  Arnoldi  factorization,  then  the  resulting 
factorization  is  also  an  Hessenberg  decomposition  of  A.  Theorem  2.1  indicates  when 
the  decomposition  is  unique.  Other  orthogonal  methods  for  computing  Hessenberg 
decompositions  are  based  upon  Givens  rotations  or  Householder  reflectors  [35,  86]. 

The  Householder  reduction  computes  a  sequence  of  Householder  reflectors  W j  de¬ 
signed  to  introduce  zeros  in  last  «  —  j  —  2  elements  of  column  j  of  Wj_  1  ■  ■  ■  Hf  A.  The 
product  Un  =  W\  •  ■  ■  Wn-2  results  in  an  orthogonal  matrix  so  that  AUn  is  upper 
Hessenberg.  The  first  column  of  Un  is  ci  so  that  by  Theorem  2.1,  the  Hessenberg  de¬ 
composition  computed  by  Algorithm  2.2  with  tq  =  t\  is  equivalent  to  that  computed 
by  the  Householder  reduction.  Given  an  arbitrary  unit  vector  iq,  a  Householder  re¬ 
duction  to  upper  Hessenberg  form  is  an  orthogonal  matrix  away  from  being  equivalent 
to  an  Arnoldi  factorization  as  the  following  result  shows. 

Lemma  2.3  Suppose  an  Hessenberg  decomposition  AVU  =  VnHn  is  com¬ 
puted  by  Algorithm  2.2  for  A  G  R'tX,t.  If  W0  is  an  orthogonal  matrix  such 
that  ILVq  =  Vnei,  then  the  orthogonal  decomposition  (Wj AWo)Un  = 

UnGn  is  such  that  Wo Un  =  Vn  and  Gn  =  Hn. 
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Proof 

Let  Wo  be  an  orthogonal  matrix  so  that  Wt) 1 ,  =  Vnt,.  Let  WjAWaUn  =  UnGn 
be  a  Hessenberg  decomposition.  Since  WolJne.\  =  Wot'i  —  Vuei,  Theorem  2.1  gives 
the  necessary  equalities.  1=1 

This  simple  observation  allows  us  to  establish  a,  direct  link  between  all  orthogonal 
reductions,  or  factorizations,  to  upper  Hessenberg  form.  As  will  be  seen  in  Chapter  5, 
the  various  methods  for  computing  the  decomposition  may  produce  drastically  dif¬ 
ferent  results  when  computing  in  finite  precision  arithmetic. 

2.4  Truncated  Arnoldi  Factorizations 

The  following  section  is  concerned  with  finding  conditions  lor  the  Arnoldi  method 
terminating  prematurely.  This  is  a.  welcome  event  since  il  AVm  =  VmHm  is  a  truncated 
Arnoldi  factorization  of  length  m,  the  eigenvalues  of  Hm  are  a  subset  ol  those  of  A. 
Indeed,  if  HmZm  =  ZmTm  is  a  real  Scliur  decomposition,  then  A(VmZm)  =  ( VmZm)Tm 
is  a  partial  one  for  A.  A  few  results  are  needed  before  a  theorem  stating  necessary 
and  sufficient  conditions  for  a.  truncated  factorization  is  presented. 

The  first  result  needed  is  a,  slight  modification  ol  Theorem  7.4.3  proved  in  Golub 
and  Van  Loan  [35].  It  allows  us  to  establish  a  connection  between  the  Krylov  matrix 

I<m(A,v  i)  =  '<>!  Av  i  Am-1u  i 

and  an  Arnoldi  factorization. 

Theorem  2.3  Suppose  Q  €  Rn x ”  is  orthogonal  and  A  €  RnXn  such 
that  AQ  =  QH  is  an  upper  Hessenberg  decomposition.  Partition  Q  = 
[Qm,Qn-m]  where  Qm  €  RriX"‘  and  set  Hm  =  Q*AQm. 

Then  Hm  is  an  unreduced  Hessenberg  matrix  il  and  only  il 
QiKm{A,Vl)  =  B.7U  €  RmXm, 

is  nonsingular  and  upper  triangular,  lor  m  =  1, . . . ,  n. 

Proof  Let  AQ  =  QH  be  an  Arnoldi  factorization  of  length  n.  Partition  Q  = 
[Qm,Qn-m]  where  Qrn  €  R"x”1  and  set  H1n  =  QjnAQm.  Note  that  QTAjv1  = 
QT  AQ  ■  ■  ■  QTAQe !  =  Hje i  for  j  =  (),...,  n  -  1,  and  then 

QTKn{A,v1)  =  [QTvx  QtAVi  ...  QTAn~1v1  ]  , 

=  eq  He,  ...  Hn~xe,  ]  , 

=  H, 


(2.4.1) 
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is  an  upper  triangular  of  matrix  order  n.  Thus  QjnKm(A,v\)  =  Rrn  is  the  leading 
principal  sub-matrix  of  R  of  order  rn. 

Suppose  that  H7n  is  an  unreduced  Hessenberg  matrix.  The  diagonal  elements  of 
Rm  are  =  ftift2  •  •  •  fti  for  i  —  1, . . . ,  m  with  ft i  =  1.  The  non-singularity  of  Rm 

now  follows. 

For  the  converse,  suppose  that  Rm  €  RmXm  is  nonsingular  and  upper  triangular. 
Since  H*e\  £  Spa,n{(q, . . . ,  e)+, }  is  a.  linear  combination  of  the  first  j  columns  of  H 
it  follows  from  equation  (2.4.1)  that  =  HmRme,j  for  j  =  1, ...  ,  m  —  1.  Since 

Rm  is  nonsingular  and  upper  triangular,  all  its  diagonal  elements  are  nonzero.  To 
show  that  Hm  is  an  unreduced  upper  Hessenberg  matrix,  consider  eJ+1Hmej+i  for 
j  =  1, . . . ,  m  —  1.  Since  R.mej+ 1  =  HinR.mej  it  follows  that 

rn 

Hm  Rm  €j  =  ^  )(c,l4.i  R-inej)  ~ 

because  ef Rm tj  =  0  for  i  >  j  and  —  0  for  i  <  j.  Thus  eJ+1Hme.j  = 

Cj+I Rm ej+1  /ej R-mtj  7^  0  for  j  =  l,...,m  -  1  since  by  assumption  the  diagonal 
elements  of  R.m  are  nonzero.  □ 

Theorem  2.3  implies  that  the  residual  /w+1  vanishes  at  the  first  step  m  such  that 
the  dimension  of  )Cm+\(A,v i)  is  equal  to  m  and  hence  is  guaranteed  to  vanish  for 
some  m  <  n. 

The  monic  polynomial  '0(A)  of  smallest  degree  such  that  '0(A)'(q  =  0  is  called  the 
minimal  polynomial  of  A  with  respect  to  v\.  The  degree  of  the  minimal  polynomial 
of  A  with  respect  to  v\  is  called  the  grade  of  vq .  Suppose  that  the  grade  of  rq  is 
m.  Define  Cm  =  [e2, . . . ,  eJU,  cm]  €  RmXm  where  cm  is  the  solution  of  the  linear 
system  Km(A,  tq)cm  =  —Ainv\.  We  note  that,  such  a  solution  exist  since  0m(A)  — 
Am  +  A m~leJncm  +  •  •  •  +  efcm  is  the  minimal  polynomial  of  A  with  respect  to  tq.  It 
follows  that 


(2.4.2)  AK,a ( A.  vq)  =  Km(A,Vl)Cm. 

The  matrix  Cm  is  called  a,  companion  matrix.  If  we  assume  that  the  diagonal  elements 
of  Rm  are  non- zero,  Theorem  2.3  implies  that  Hln  is  unreduced.  From  equation  (2.4.2) 
the  identity  AQm  =  Qm{RmCinR~})  follows.  By  the  Implicit  Q  theorem,  Qrn  — 
Vm  and  Hrn  —  RmCmR~*  since  the  first  columns  of  Qm  and  Vm  are  equal.  From 
equation  (2.4.2)  it  follows  that  the  characteristic  polynomial  for  Cm  is  equal  to  the 
minimal  polynomial  of  A  with  respect  to  tq . 


15 


R.uhe  [69]  shows  that  AKj(A,v i)  =  Kj(A,v\)Cj  where  Cj  =  [e2, . . . ,  e7,Cj]  €  RJXJ 
for  j  <  m.  The  vector  Cj  €  RJ'  solves  the  least  squares  problem 

(2.4.3)  min  ||A7iq  —  A'7(A, 'tq)c||  =  \\Ajvi  —  Kj(A,vi)cj\\. 

cgR' 

Denote  the  residual  of  the  least  squares  problem  by  r3  and  note  that  rj  =  7/q(A)iq 
where 

0i(A)  =  A"  AJ-1]rc,  =  det  (Cj  -  A/,). 

It  follows  that 

AKj(A,v i)  =  Kj(A,v\)Cj  ‘f'jej . 

If  AVj  —  VjHj  +  fjtj  is  an  Arnoldi  factorization  of  length  j  with  Vje i  =  v\  then 
Theorem  2.3  implies  that  /7eJ  =  7-?eJ.  Hence 

(2.4.4)  /j  =  (>J  R,?,)- 'r,  =  (<  J  R]Cjr%(A)vt. 

Saad  [75]  uses  projection  arguments  to  show  that  i/q ( A )  minimizes  ||'0j(H)ui||  over 
all  monic  polynomials  '</>,  of  degree  j .  This  property  is  also  a.  direct  consequence  of 
equation  (2.4.3). 

The  following  theorem  summarizes  the  preceding  discussion  on  the  various  rela¬ 
tionships  between  an  Arnoldi  factorization  and  Krylov  matrices.  We  remark  that  the 
previous  discussion  is  in  the  spirit  of  that  presented  by  Sorensen  [83,  pages  360-362]. 

Theorem  2.4  Suppose  the  integer  m.  is  the  grade  of  the  unit  vector  v\ 
with  respect  to  A.  Let  a  sequence  of  Arnoldi  factorizations  be  given  by 
AVj  —  VjHj  +  fjtj  for  j  <  in  where  V:ici  =  v\.  If  Kj(A,i>i)  —  QjB.j  where 
Hj  is  upper  triangular  then  x/>j( A)  =  det(6'7  —  XI j)  solves 

min  1 1 -</>.,(  A) iq  1 1 

over  all  monic  polynomial  of  degree  j.  Moreover,  Cj  =  [e2, . . . ,  e7,  c7]  € 

RA  Xjl  is  the  companion  matrix  for  Hj  where  AKj(A:  iq )  =  Kj(A,vi)Cj  + 
rjej  with 

fa-’-Pjfj  =  <P.i(A)v  i  =  r:n 

and  if  the  sub-diagonal  elements  of  Rj  are  positive,  then  HjRj  =  RjCj , 

Vj  =  Qj ,  and  ejRjej  =  [12  ■  ■  ■ 
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We  now  state  the  main  result  of  the  section  indicating  when  an  exact  truncated 
factorization  occurs.  This  is  desirable  since  the  columns  of  14  form  a,  basis  for  an 
invariant  subspace  and  the  eigenvalues  of  Hk  are  a  subset  of  those  of  A. 

Theorem  2.5  Let  equation  (2.2.1)  define  a  fc-step  Arnoldi  factorization 
of  A ,  with  Hk  unreduced.  Then  f\  =  0  if  and  only  if  tq  =  QkV  where 
AQk  =  QkRk  with  QlQk  =  h,  and  Rk  an  upper  quasi-triangular  matrix 
of  order  k. 

Proof  If  fk  =  0  then  A 14  =  VkHk.  Let  HkZk  =  ZkRk  be  a  real  Schur  decomposition 
where  Zj Zk  —  h  and  R-k  £  Rfcx*  is  an  upper  quasi-triangular  matrix.  Then 

Vi  —  Vk  C 1  =  VkZkZTk  Cj  =  Qk'lJ, 

where  y  =  Z[e.i  and  VkZk  =  Qk •  Note  that  AQk  =  QkRk- 

Conversely,  suppose  that  AQk  =  QkRk  with  Qj.Qk  =  h  and  R.k  is  an  upper 
quasi-triangular  matrix  of  order  k.  Let  ;q  =  QkV  with  y  €  R*  arbitrary.  Now,  for 
any  integer  rn  >  0,  AmQk  —  QkR'k  and  thus 

Amvi  =  A" Qk:y  =  QkR’k  !!  €  R(Qk)- 

Hence  the  dimension  of  the  Krylov  subspace  JCm(A,vi)  is  at  most  k.  Since  Hk  is 
unreduced,  Theorem  2.3  implies  that,  the  dimension  of  K-k+i(A,  rq)  is  k  and  hence 
fk  =  0.  □ 

The  theorem’s  hypothesis  indicates  that  the  range  of  Qk  represents  an  invariant 
subspace  for  A.  The  diagonal  blocks  of  R.k  contain  the  eigenvalues  of  A.  The  complex 
conjugate  pairs  are  in  blocks  of  order  two  and  the  real  eigenvalues  are  on  the  diagonal 
of  Rk,  respectively.  The  matrix  equation  AQk  —  QkRk  is  a.  partial  real  Schur  decom¬ 
position  of  order  k  for  A.  In  particular,  if  the  initial  vector  is  a  linear  combination 
of  k  linearly  independent  eigenvectors  then  the  A-tli  residual  vector  vanishes.  It  is 
therefore  desirable  to  to  devise  a.  method  that  forces  the  starting  vector  V\  to  be  a 
linear  combination  of  Schur  vectors  corresponding  to  wanted  eigenvalues. 

Theorem  2.5  gives  conditions  for  the  Arnoldi  factorization  to  prematurely  ter¬ 
minate.  Computing  in  finite  precision  arithmetic  blurs  the  exact  conditions  of  the 
theorem.  The  Implicit  Q  theorem  and  the  results  of  §  2.3  show  that  all  orthogonal  re¬ 
ductions  to  upper  Hessenberg  form  are  related.  Thus  the  optimality  property  of  Saad, 
and  Ruhe’s  characterization  of  the  Arnoldi  factorization  are  fundamental  results  con¬ 
cerning  the  reduction  of  a  matrix  to  upper  Hessenberg  form.  Ruhe’s  analysis  forms 
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the  basis  of  a  perturbation  theory  for  the  Hessenberg  reduction  that  is  presented  in 
Chapter  5.  In  particular,  the  theory  developed  determines  the  sensitivity,  or  degree 
of  forward  instability,  of  an  Arnoldi  or  QR  iteration  upon  the  starting  vector. 

2.5  Stopping  Criteria 

This  section  considers  the  important  question  of  determining  when  a  length  k  Arnoldi 
factorization  has  computed  approximate  eigenvalues.  II  the  norm  of  fk  is  small,  the 
k  eigenvalues  of  Hk  are  approximations  to  k  eigenvalues  ol  A.  Numerical  experience 
indicates  that  \\fk\\  rarely  becomes  small  let  alone  zero.  Nevertheless,  some  of  the 
eigenvalues  of  Hk  may  be  good  estimates  of  the  eigenvalues  of  .4.  Since  the  interest 
is  in  a  small  subset  of  the  eigensystem  of  A,  alternate  criteria,  that  allow  termination 
for  it<n  are  needed.  Let  Hk*  =  -s  0  where  ||.s||  =  1.  Define  the  vector  xr  =  14  s  to 
be  a  Ritz  vector  and  the  scalar  0  to  be  Ritz  value.  Then 

(2.5.1)  \\AVks  -  14/MI  =  IIAtv  -  xr0\\  =  \\fk]\  \els\, 

indicates  that  if  the  last  component  of  an  eigenvector  lor  Hk  is  small  the  Ritz  pair 
(xr,  0)  is  an  approximation  to  an  eigenpair  of  A.  We  note  that  by  Lemma  2.1,  >  0 

if  Hk  is  unreduced.  This  pair  is  exact  for  a  nearby  problem:  it  is  easily  shown  that 
(A  +  E)xr  =  xr0  with  E  =  -{els)fkx?.  The  advantage  of  using  the  Ritz  estimate 
ll/fcll  \eIs\  's  to  avoid  explicit  formation  of  the  direct  residual  AVkS  —  VksO  when 
accessing  the  numerical  accuracy  of  an  approximate  eigenpair.  We  remark  that  a 
small  Ill'll  does  not  imply  that  the  Ritz  pair  0)  is  an  accurate  approximation  to 
an  eigenpair  (x,  A)  of  A.  The  perturbation  theory  presented  in  §  5.2  of  Chapter  5 
considers  these  accuracy  issues. 

Recent  work  by  Chatelin  and  Fraysee  [18,  19]  and  Godet-Thobie  [34]  suggests 
that  when  A  is  highly  non-normal,  the  size  of  ejs  is  not  an  appropriate  guide  lor 
detecting  convergence.  If  the  relative  departure  from  normality  defined  by  the  Ilenrici 
number  || AAT  -  AT A\\F/\\A2\\F,  is  large,  the  matrix  A  is  considered  highly  non¬ 
normal.  Assuming  that  A  is  diagonalizable,  a  large  Henrici  number  implies  that  the 
basis  of  eigenvectors  is  ill-conditioned  [18].  Bennani  and  Braconnier  compare  the 
use  of  the  Ritz  estimate  and  direct  residual  ||Axr  —  xr0||  in  Arnoldi  algorithms  [12]. 
They  suggest  normalizing  the  Ritz  estimate  by  the  norm  of  A  resulting  in  a  stopping 
criteria  based  on  the  backward  error.  The  backward  error  is  defined  as  the  smallest,  in 
norm,  perturbation  AA  such  that  the  Ritz  pair  is  an  eigenpair  for  A  +  A.4.  Scott  [80] 
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presents  a  lucid  account  of  the  many  issues  involved  in  determining  stopping  criteria 
for  the  unsymmetric.  problem. 

2.6  Convergence  Properties  of  Krylov  Spaces 

In  this  section,  we  consider  the  rate  at  which  the  eigenvalues  of  Hm  emerge  as  ap¬ 
proximations  to  those  of  A  as  rn  increases  towards  n.  Since  Hm  is  the  projection  of 
A  with  respect  to  the  columns  of  Vm,  Saad  [74]  proposes  studying  the  convergence  of 
the  two  residuals  ( A  —  0I)xr  or  ( Vm H1U  —  A In)x,  for  some  eigenpair  (x,  A)  of  A, 
to  zero.  Indeed,  the  former  residual  is  that  used  in  equation  (2.5.1)  of  the  previous 
section.  Saad  [78]  uses  the  latter  residual  to  obtain  the  inequality 

(2.6.1)  ||(ff„.-A/„.)(K,P)||  <  7,Jl(Jj||];’,[V[|"),':|1, 

where 

=  \\VmV%A(I  -  KhK,T)II  <  INI- 

The  quality  of  the  approximation  afforded  by  Vm  and  Hm  is  governed  by  the  tangent 
of  the  angle  between  K'ni(A,  Vint\)  and  x,  which  is  given  by  the  ratio  on  the  right  hand 
side  of  equation  (2.6.1).  Thus,  the  size  of  the  numerator  ||(7  —  VmVr^)x\\,  the  sine 
of  the  angle  between  K'm(A,Vmei)  and  x,  is  the  quantity  to  estimate.  The  following 
theorem  which  we  state  without  proof  is  due  to  Saad  [75]. 

Theorem  2.6  Assume  tha.t  A  is  diagona.liza.ble  with  eigenpairs  (xj,  \j) 
where  each  eigenvector  is  of  unit  length.  If  Vm e }  =  x x (jj  -(-•■•  +  xn(n  with 
0,  then  there  exist  m  eigenvalues  A2, .  . . ,  Am+i  of  A  such  that 

(2-6.2)  ll(/ -K.O-.II  < 

.7=2  Kll 

where 

2 

c « 

C1 

If  |Ax  —  A;  |  >  |Aj  —  A;  |  for  j,l  =  2,  ...,m  +  1  then  ej'1  <  1.  The  geometrical 
interpretation  is  that  A’s  extremal  eigenvalues  that  are  well  separated  emerge  as 


H+l  711  +  1  \  \ 

e  n  2 

7=2  1=2,  «7  Al  A? 
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eigenvalues  of  II m .  This  generalizes  t.lie  well  known  convergence  behavior  of  the 
Lanczos  iteration  [61,  73]. 

The  constant  multiplying  e"*  consisting  of  the  normalized  sum  of  expansion  coef¬ 
ficients  on  the  righthand  side  of  equation  (2.6.2)  reflects  the  possible  ill-conditioning 
of  the  matrix  of  eigenvectors  for  A.  This  may  be  seen  as  follows.  Suppose  the  left 
eigenvector  [35,  101]  corresponding  to  the  right  eigenvector  Xj  is  denoted  by  yf,  for 
j  =  1, . . .  ,n  where  A 3-  is  distinct  from  the  other  eigenvalues.  Assume  the  left  eigen¬ 
vector  is  also  of  unit  length.  Using  the  Cauchy-Schwartz  inequality,  it  follows  that 
|Ci||yfzi|  =  \yfv r|  <  11^1111^11  =  1,  giving  |C,|  <  sec where  tpi  measures  the  angle 
between  the  corresponding  left  and  right  eigenvector.  II  the  eigenvalue  A;  is  poorly 
conditioned,  then  sec  (fit  is  large  and  possibly  so  is  the  coefficient  |£,-|.  If  we  assume 
that  the  eigenvalues  of  A  are  distinct,  then 

V''  I  Cj  1  1  sec  VN  I 

h  i Ci  i  -  u  icii  ’ 

may  be  quite  large.  The  conclusion  is  that  a  large  factorization  may  need  to  be 
built  for  poorly  conditioned  eigenvalue  problems  in  order  for  good  estimates  of  A’s 
eigenvalues  to  emerge  in  Hm.  In  addition,  if  A  is  defective,  it  may  not  possess  a  basis 
of  eigenvectors.  Numerically,  problems  are  encountered  when  a  basis  for  the  desired 
invariant  subspace  is  poorly  conditioned.  The  recent  thesis  of  Jia  [43]  extends  Saad’s 
results  without  the  assumption  that  A  is  diagonalizable. 

Finally,  we  end  with  a  theorem  that  combines  jm  and  ftm+i  to  estimate  how  close 
tCm(A,  Vwe\)  is  to  an  invariant  subspace  of  A.  But  first  we  provide  a  brief  motivation 
for  the  theorem. 

Suppose  that  Z  —  \^Z\  Z2  j  is  an  orthonormal  matrix  where  the  columns  of 

An  A12 
A-i  i  A'2‘2 

where  Atj  =  Zf  AZ1  for  i,j  =  1,2.  Since  7^(Zi)  is  invariant  under  A,  there  exist  a 
matrix  G\  6  RmXm  so  that  AZ\  =  Z\G\.  Thus,  A2 1  =  Z^ AZ\  =  Z,J Z\G\  =  0  since 
Z  is  orthonormal. 

Stewart  [85]  considers  the  interesting  question  of  how  close  Z\  is  to  an  invariant 
subspace  of  A  is  if  ||A2i||  is  small  instead  of  zero.  For  example,  if  Z  and  ZT A Z  are 
partitioned  conformably  with  Z  and  ZT AZ ,  respectively,  can  an  orthonormal  matrix 


Z\  €  RnXm  spans  an  invariant  subspace  for  A.  Partition  ZTAZ  = 
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Y  deviating  little  from  In  be  found  so  that  Z  =  ZY  ?  Stewart  chooses 


/  -PT 

'  (/,„  +  PTP )-a/2  0 

P  I 

0  +  PPT)-1/2  _ 

where  P  £  R(»-m)xm  an(]  sjnce  both  +  PT P  and  +  PPT  are  positive  definite 
and  symmetric  matrices,  the  square  roots  are  uniquely  defined.  The  answer  to  whether 
the  column  space  of  Z\  is  an  accurate  approximation  to  that  of  Z\  becomes  that  of 
analyzing  the  interaction  among  the  matrices  P  and  Aij  for  i,j  =  1,2.  The  analysis 
presented  by  Stewart  gives  the  following  interesting  interpretation  with  respect  to  an 
Arnoldi  factorization. 


Theorem  2.7  Suppose  that  AV,n  =  VmHm  +/me^(  is  a.  length  m  Arnoldi 
factorization  that  is  extended  to  a  Hessenberg  decomposition  of  A: 


A 


Vn—m 

V  V 

*  m  » n—m 

H, 


TO 

T 


An+ P'P'-m  Hn 


where  fim+1  =  ||/w||.  Let 


^Tn  S 


T-r  ,  .  \\XHm  -  Hn-mX\\p 

=  mill - - . 

If  4/3„1+i7„t  <  ,  then  there  is  a  matrix  P  that  satisfies  the  bound 

||Pj|  <  2%±i 


so  that  the  columns  of  Qm  —  (Vm  +  Vn-mP)(I  +  PT P)  are  an  orthog¬ 
onal  basis  for  an  invariant  subspace  of  A. 


Proof  A  simple  derivation  shows  that  7„(  =  || V7nV^A(I  —  K„^)||  =  ||Mm||.  The 
conclusion  now  follows  directly  from  Theorem  4.1  of  Stewart  [85].  □ 

The  size  of  7 m  measures  the  amount  of  coupling  between  the  7Z(V1U)  and  1Z(Vn—m). 
The  reciprocal  of  6m  measures  the  sensitivity  of  the  7 Z(Qrn)  as  an  invariant  subspace. 
It  may  be  shown  that 


sep( //,„,  ffa—r/i)  A  01111 1 A/, (77nt ^  j [ . 

k,l 

Moreover,  Varah  [94]  shows  that  if  the  matrices  involved  are  highly  non-normal,  the 
smallest  difference  between  the  spectrums  of  H.,n  and  Hu-m  may  be  an  over  estimate 
of  the  actual  separation. 
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Theorem  2.7  shows  the  dependence  of  upon  7„t  and  SJIL  in  determining 

the  quality  of  the  7UVm)  as  an  eigenspace  of  A.  Since  VjtQm  =  (/  +  PTP)~1^2, 
Stewart  [85]  shows  that  the  singular  values  of  P  are  the  tangents  of  the  canonical,  or 
principal,  angles  [18,  35,  85]  between  the  two  spaces  spanned  by  the  columns  of  Vm 
and  Qm ,  respectively. 

Unfortunately,  both  Theorems  2.(5  and  2.7  require  information  about  A  that  is  not 
readily  available.  In  addition,  Theorem  2.7  requires  that  the  sub-diagonal  element 
(3m+ 1  of  H  be  small  relative  to  6m  and  qm.  The  next  two  chapters  give  conditions 
under  which  we  can  expect  this  situation  to  occur. 
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Chapter  3 


The  QR  Algorithm 


The  QR  algorithm  is  a  general  purpose  method  for  computing  all  the  eigenvalues  of 
dense  matrices.  The  LR-iteration  of  Rutishauser  [72],  based  on  a  triangular  sequence 
of  similarity  transformation,  preceded  its  discovery.  The  QR  algorithm,  developed 
independently  by  both  Francis  [32]  and  Kublanovskaya  [45],  instead  uses  a  sequence 
of  orthogonal  similarity  transformation  The  algorithm  iteratively  computes  an  ap¬ 
proximation  to  the  real  Schur  decomposition. 

The  chapter  first  examines  the  explicitly  shifted  iteration  and  some  of  its  funda¬ 
mental  properties  in  §  3.1.  The  convergence  of  the  iteration  is  considered  in  §  3.2. 
The  well  known  duality  of  the  QR  iteration  and  inverse  iteration  is  interpreted  in 
terms  of  Krylov  subspaces  in  §  3.3.  The  practical  QR  algorithm  is  the  subject  of 
§  3.4  which  includes  a  discussion  of  the  implicitly  shifted  version.  There  is  wealth  of 
excellent  material  on  the  QR  algorithm.  Thorough  introductions  are  given  by  Golub 
and  Van  Loan  [35],  Parlett  [01],  Stewart  [80]  and  Watkins  [97,  98].  More  advanced 
treatments  include  those  by  Parlett  and  Poole  [65],  Watkins  and  Eisner  [100],  and 
Wilkinson  [101]. 

For  the  remainder  of  the  chapter  we  assume  that  A  is  factored  into  AU  —  U H 
where  H  is  an  unreduced  upper  Hessenberg  matrix  and  UTU  =  In.  There  is  no  loss 
of  generality  since  if  H  is  reduced  then  for  some  1  <j<  n, 


H 


h3 

0 


Mj 

Hn-j 


where  Hj  is  an  unreduced  Hessenberg  matrix.  The  eigenvalues  of  H  are  the  eigenval¬ 
ues  of  Hj  and  Hn-j  so  that  we  may  work  with  H3  and  then  in  turn  Hn-j  •  We  remark 
that  if  Schur  vectors  or  eigenvectors  are  desired  for  any  of  the  eigenvalues  of  Hn~j, 
the  sub-matrix  Mj  is  required. 


3.1  Explicitly  Shifted  QR  Iteration 

The  explicitly  shifted  QR-iteration  is  defined  by 
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Algorithm  3.1 

Input:  H™  =  H  an  unreduced  upper  Hessenberg  matrix,  and  a  sequence 
of  shifts  {T3Yj=l. 

Output:  aiK|  <—  gC)  ■  ■  ■  QG), 

1.1  For;  =  1 

2.1  Compute  the  QR  factorization  : 

QG)rG)  =  //0)  -  t,-/  ; 

2.2  //C+!)  ♦_  tfCdgCd  +  T.j 

One  cycle  of  the  iteration  is  said  to  be  a,  QR  step.  Some  of  the  most  important 
properties  in  a  QR  step  are  summarized  with  the  following  lemma. 

Lemma  3.1  Let  H  —  tI  =  QR  be  a  QR  factorization  where  H  is  an 
unreduced  upper  Hessenberg  matrix  and  denote  Then  the 

following  properties  hold: 

1 .  g  is  an  upper  Hessenberg  matrix. 

2.  p,-  0  for  i  —  1, —  1 

3.  pn  —  0  if  and  only  if  r  is  an  eigenvalue  of  H. 

4.  eJt(BQ  +  tI)  =  TtJt  if  and  only  if  r  is  an  eigenvalue  of  H . 

Proof  A  sequence  of  plane  rotations  6r(  ,+x  are  easily  constructed  so  that 

^n-l,n  •  •  •  GUH  -  Tl) 

is  upper  triangular  [35,  page  215].  Each  is  designed  to  annihilate  the  entry  in 

the  ( i  +  l pi)  entry  of  G?_x  i  •  •  •  G[\2(H  -  tI )).  The  product  Gh2  •  •  •  Gn_hn  is  upper 
Hessenberg  and  G%_l  n  ■  ■  ■  G{'2(H  -  tI)  is  upper  triangular.  Set  Q  =  Gxa  •  •  •  Gn-\,n 
and  B.  =  Qh(H  —  t I).  Note  that  Q  is  an  upper  Hessenberg  matrix. 

A  simple  derivation  shows  that  eJ+1Hci  =  r'f+^Qep),.  Since  H  is  an  unreduced 
upper  Hessenberg  matrix,  0  <  | ef+1Hei\  =  |e^_igei||pt|  <  |p,|  for  i  =  l,...,n  -  1 
establishing  the  second  property. 

The  matrix  H  —  tI  is  singular  if  and  only  if  r  is  an  eigenvalue  of  H.  The  third 
property  follows  immediately  since  det (H  -  tI)  =  det(/?)  =  px  ■  ■  ■  pn  is  zero  if  and 
only  if  pn  is. 
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The  third  property  gives  pn  =  0  if  r  is  an  eigenvalue  of  H.  Since  eJtR  —  the 
final  property  holds.  □ 

The  lemma  allows  us  to  conclude  that  all  f/C)  remain  upper  Hessenberg.  The  only 
sub-diagonal  of  f/h)  that  ever  becomes  zero  is  the  last  one  and  this  is  purely  a  function 
of  the  shift.  If  a  shift  is  equal  to  an  eigenvalue,  then  we  no  longer  have  an  unreduced 
Hessenberg  matrix  and  instead  we  only  continue  working  with  the  leading  sub-matrix 
of  H®  that  remains  unreduced.  It  should  be  emphasized  that  the  conclusions  of 
Lemma  3.1  hold  in  exact  arithmetic.  An  elegant  extension  of  Lemma  3.1  to  the  case 
where  p  shifts  are  applied  is  proved  by  Miminis  and  Paige  [53].  However,  we  show  in 
Chapter  5  that  computing  in  finite  precision  may  have  dramatic  effects  that  degrade 
the  expected  performance  of  multiple  shifts. 

The  following  properties  are  consequences  of  the  iteration.  The  first  two  are  easily 
established  using  mathematical  induction;  see  for  example  [86,  97].  The  third  is  a 
standard  result  that  does  not  depend  upon  the  condition  that  H  is  an  Hessenberg 
matrix. 

Lemma  3.2  Let  =  Q(1)  •  •  •  Q(v) .  Then  H 

Proof  The  result  follows  by  a  simple  induction  argument  since  it  easily  follows  that 
tfd  =  rWq( i)  +  Tli  =  (QW)HhWQW  -  tJ  +  tJ  =  (QW)HHQW.  □ 

Theorem  3.2  Let  Z {,l)  =  Q(1)  ■  ■  ■  and  T™  =  R.W  ■  ■  ■  R.W.  Then 
2(p)tM  =  V{H)  where  V{\)  =  (A  -  n)  •  •  •  (A  -  rp). 

Proof  For  p  =  1  the  result  is  Line  2.1  of  Algorithm  3.1.  Suppose  the  result  is  true 
for  p  —  1.  From  Line  2.2  of  Algorithm  3.1  and  Lemma  3.2  we  have 

R(p)  =  (#(!>+ 1)  _  tpI)(QM)h  =  (Z^)H{H  -  tpI)ZW(QM)h, 

and  note  that  —  Zh'_1L  Thus 

T(p)  —  /)>(p) j'6'-1)  —  [Z(v))h(H  -  tvI)Z^v~^T^v~1\ 

which  results  in  Z^T^  =  (H— =  V(H)  by  the  induction  hypothesis. 
□ 

Theorem  3.3  Suppose  that  H  G  R"Xn  and  let  V{\)  =  (A  —  rx)  •  •  •  (A  — 

Tp )  be  a  polynomial.  If  Hst  =  .s,;A,  where  .st  ^  0  then 


(3.1.1) 


V(H). Si  =  stV(\t). 
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Each  H (j)  computed  by  the  iteration  is  orthogonally  similar  to  the  original  H 
according  to  Lemma  3.2.  Theorem  3.2  tells  us  that  the  explicitly  shifted  QR-iteration 
computes  the  QR  factorization  of  'P(H).  The  proof  is  due  to  Stewart  [86,  page  353]. 
Since  T ^  is  an  upper  triangular  matrix,  the  first  k  columns  of  are  an  orthogonal 
basis  for  the  space  spanned  by  the  first  k  columns  of  V(H). 

Since  any  of  the  shifts  might  have  a  nonzero  imaginary  part,  the  matrix  is 
in  general  unitary.  In  practical  computation,  the  constructed  is  orthonormal  as 
long  as  two  of  the  shifts  applied  form  a  complex  conjugate  pair.  The  details  of  the 
application  of  a  complex  conjugate  pair  of  shifts  in  real  arithmetic  are  delayed  until 
§  3.4.  Unless  otherwise  stated,  we  assume  that  if  a  shift  has  a  nonzero  imaginary  part 
then  its  complex  conjugate  pair  is  also  applied. 

If  any  shift  Tj  is  equal  to  an  eigenvalue  A,  of  //,  then  Theorem  3.3  gives  that 
V(Xi)  =  0.  Thus,  the  non-zero  eigenvalues  of  H  used  as  shifts  are  zero  eigenvalues 
of  V(H).  The  previous  three  results  will  prove  useful  for  the  remainder  of  the  thesis. 
For  the  present  they  allows  us  to  establish  the  following  theorem. 

Theorem  3.4  Suppose  that  All  =  IJH  is  an  upper  Hessenberg  decom¬ 
position  of  A  where  H  has  positive  sub-diagonal  elements.  Suppose  that 
Algorithm  3.1  is  used  with  the  p  shifts  t1?  . . . ,  tv  on  H  resulting  in  H^PK 
Let  Z®  =  Q(1}  •  •  •  Q(l’>  and  V(X)  =  (A  -  n)  •  •  •  (A  -  rp). 

If  AVk  =  VkHk  +  fk^k  E  an  Arnoldi  factorization  with  the  first  column  of 
14  equal  to  gV(A)U tq  where  p~l  =  ||P(A)r/ei||,  then  Hk  is  the  same  as  the 
leading  principal  sub-matrix  of  order  k  of  and  14  =  U Z^p’[t i, . . . ,  ek] 
for  k  =  1, . . . ,  n. 

Proof  Let  All  —  IJH  be  a  upper  Hessenberg  decomposition  of  A  where  H  has 
positive  sub-diagonals  elements.  Using  Lemma  3.2  it  follows  that 

(3.1.2)  AlJZ(v)  =  UIIZ(v)  =  UZ{r,)H{p+i). 

Partition  UZ (p)  =  Wk  Wn-k  and  H^,+P  =  (fJ+*)  T  Mv+A  •  Ecluate  tlie 

[  tlCk  Hn-k 

first  k  columns  of  equation  (3.1.2)  to  obtain 

AWt  =  WtH^"  + 

Theorem  3.2  gives  Z^T^  =  V(H)  where  =  i?lp)  •  •  •  RPK  But  V(H)  = 
V(UtAU)  =  UTV(A)U  which  implies  that  UZ^T^  =  V{A)IJ.  If  fn  = 
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then  UZMT(v)ex  =  lJZ{,l)e i/q  which  gives  f/Z(p)ex  =  pxlV{A)Ue x.  Theorem  2.1 
of  Chapter  2  (Implicit  Q)  then  gives  that  Hu  =  14  =  M4,  and  jk  = 

4P+1)(VFn-fcei)  with{>  =  pr1-  D 

We  remark  that  if  in  the  theorem’s  hypothesis  the  m-th  sub-diagonal  of  H  is  zero, 
then  the  conclusion  only  holds  for  k  —  1, . . . ,  rn.  A  fundamental  identification  between 
an  Arnoldi  factorization  and  an  explicitly  shifted  QR  iteration  is  established.  The  first 
m  columns  of  Z^  are  an  orthogonal  basis  for  the  Krylov  subspace  gV(A)U ex). 

In  words,  every  step  of  a,  QR-iteration  defines  a.  Krylov  subspace  and  hence  an  Arnoldi 
factorization.  The  immediate  benefit  is  to  establish  the  convergence  typical  of  an 
Arnoldi  iteration. 

3.2  Convergence  of  an  Explicitly  Shifted  QR  Iteration 

The  main  result  of  this  section  gives  conditions  that  determine  the  convergence  of 
the  explicitly  shifted  QR  iteration  on  Hessenberg  matrices.  Parlett  [60]  presents  the 
first  set  of  comprehensive  sufficient  conditions  for  convergence  of  the  QR-iteration 
on  Hessenberg  matrices  while  a.  portion  of  the  paper  by  Parlett  and  Poole  [65]  con¬ 
siders  a  geometric  convergence  theory  for  Hessenberg  matrices.  A  comprehensive 
geometric  convergence  theory  for  the  shifted  QR  iteration  is  presented  by  Watkins 
and  Eisner  [100]  within  the  more  general  framework  of  generic  GR  algorithms.  A  GR 
algorithm  is  an  iterative  procedure  such  as  in  Algorithm  3.1  where  the  QR  factoriza¬ 
tion  is  replaced  with  any  other  decomposition  of  the  form  GR  =  H  —  tI  where  R  is 
upper  triangular  and  G  is  a  nonsingular  matrix. 

Theorem  3.5  Let  H  €  RnX”  be  an  unreduced  upper  Hessenberg  ma¬ 
trix  and  T(A)  be  a  polynomial.  Order  the  eigenvalues  Ai,  A2, . . . ,  An  of  H 
so  that  |T(AX)|  >  |'L(A2)|  >  •  ••  >  |>f(Art)|.  Let  HQ  =  QR.  a  real  Schur 
decomposition  where  the  first  k  columns  of  Q  span  an  eigenspace  corre¬ 
sponding  to  the  eigenvalues  Ax, . . . ,  A*.  Suppose  k  is  a  positive  integer  less 
than  n  such  that  f>k  =  |vP(^fc+i)|/|^(^i:)|  <  1- 

If  a  sequence  of  shifts  {rI}'"i1  has  the  properties  that 

Vm{\ ,)  =  (A;  -  n)  •  •  •  (A,  -  r,„)  — ►  T(A,),  i  =  k  +  l,...,n 

Vm{  Ai)  +  0,  i  =  l  ,...,A; 

in 

n r*  e  R’ 

1=1 
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as  m  — ►  oo,  then  Algorithm  3.1  computes  an  upper  Hessenberg  matrix 


ff(m+ 1) 


H{km+1) 


Pbtl)eieTk 


M{km+1) 

ttn-k 


and  an  orthogonal  matrix  Z ^  such  that  for  every  value  of  pk  satisfying 
Pk  <  Pk  <  1  there  exist  a  constant  C  such  that 


idlAl  <  C(h)"‘  and  .list 

where  Zj”1  =  Z<“>{e,, . . . ,  et). 


Proof  See  Theorems  5.4  and  G.2  of  Watkins  and  Eisner  [100].  □ 

The  distance  between  the  subspaces  [18,  35]  'R-(Qk)  and  7 Z(Z[P^)  may  be  shown  to 
be  equal  to  \Jl  —  \\Ql  ^k'^ \\2  ■  For  increasing  values  of  m,  the  approximating  subspace 
7l(ZkP'>)  aligns  itself  with  7l(Qk)-  Thus  the  dist(Qfc,  Zj,"^)  — *  0  and  the  eigenvalues  of 
tend  to  Ai, . . . ,  Ajt.  It  follows  from  the  theorem  that  for  all  values  of  k  such 
that  pk  <  1,  the  k- tli  sub-diagonal  element  of  //("'+1)  tends  to  zero. 

The  hypothesis  on  the  product  of  the  shifts  ensures  that  if  one  is  applied  with  a 
nonzero  imaginary  part,  then  its  complex  conjugate  is  also  a  shift.  The  hypothesis 
on  pk  implies  that  complex  conjugate  pairs  of  eigenvalues  are  kept  together;  A,-  =  A j 
only  if  i,j  <  k. 

The  theorems  proved  by  Watkins  and  Eisner  in  [100]  identify  the  convergence  of 
the  QR  algorithm  with  that  of  simultaneous  iteration,  or  subspace  iteration.  The 
QR-iteration  uses  the  starting  subspace  of  Span {ei,  eg, . . . ,  e*}.  This  is  easily  seen 
by  using  Theorem  3.2  and  equating  the  first  k  columns  of  Z(m)T(?n)  =  V(H).  This 
forms  the  basis  of  a  geometric  convergence  theory  for  the  QR-iteration  and  other  GR 
algorithms. 


3.2.1  Implications  for  a  Shifting  Strategy 

The  following  example  is  due  to  Watkins  and  Eisner  [100,  page  30]  and  illustrates  the 
use  of  Theorem  3.5. 

Suppose  that  {Ai, . . . ,  A*,}  U  {A*,+i, . . . ,  An}  is  a  disjoint  partition  of  the  spectrum 
an  unreduced  upper  Hessenberg  matrix  H  6  R" x .  We  also  assume  that  the  complex 
conjugate  pairs  of  eigenvalues  are  kept  together;  A,  =  Aj  implies  that  i,j  <  k  or 
i,j  >  k.  Define  the  polynomials 


tf(A)  =  (A-Afc+1)...(A-An)  and  Vm(X)  =  (A  -  n)  •  •  •  (A  -  rm). 
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The  shifts  r,-  are  chosen  so  that  T(A,)  —  Um(A,)  — ►  0  as  m  — >  oo,  for  i  =  k  +  1, ...  ,n 
but  7^  0  for  j  =  1, . . . ,  k  and  that  there  exist  a  positive  integer  m0  such  that 

for  all  integers  m  >  m0, 


(3.2.1) 


min  l'P„.0,)l 

3=1,—,* 


>  .  max  \Vm(\j)\. 

7=Ar+l,...,n 


It  is  also  assumed  that  if  any  of  the  shifts  has  a  nonzero  imaginary  part,  its  complex 
conjugate  is  also  a  shift.  If  pk  =  |'f,(Ai:+i)|/|'I'(Ar-)|  <  1,  then  Theorem  3.5  gives 
that  Algorithm  3.1  computes  a  sequence  of  Hessenberg  matrices  and  orthogonal 
matrices  such  that 


4+i+1)-*»  and  dist(Qfc,4m))-+0 

where  Zj.m)  =  Z(m)[ei, . . . ,  e*].  It  follows  that  HZ[m)  =  Zl”l)ffj.m)  is  converging  to 
the  partial  real  Schur  decomposition  of  interest. 

The  search  is  for  the  best  approximating  polynomial  Vm.(X)  or  equivalently,  a 
proper  set  of  shifts.  If,  for  example,  t,-  =  A*+l-  for  i  =  1, .  . .  ,n  —  k  then  Vn-k(  A)  =  'f(A) 
and  pk  —  0.  Thus,  after  application  of  the  n  —  k  shifts,  the  leading  principal  sub¬ 
matrix  of  order  k  of  the  upper  Hessenberg  matrix  i/(n_fc+1)  computed  by  Algorithm  3.1 
contains  the  eigenvalues  Ai, . . . ,  A*,.. 


3.2.2  Implications  for  an  Arnoldi  Factorization 

As  mentioned  in  §  1.2.1  of  Chapter  1,  computing  a  partial  real  Schur  decomposition 
corresponding  to  a  small  subset  of  the  eigenvalues  of  A  is  the  major  goal  of  this 
thesis.  Since  the  size  of  A  is  often  so  large  as  to  prevent  using  the  QR  algorithm,  let 
us  consider  the  possibility  of  computing  the  just  the  leading  portion  of  the  iteration. 
Let  AU  =  UH  be  a  Hessenberg  decomposition.  Using  the  notation  of  Theorem  3.5, 
we  may  write  a  length  k  Arnoldi  factorization  as 

(3.2.2)  AUZ[m)  =  UZlm)Hlm+l]  +  P%£l)(Zlm>ek+1)el. 

Suppose  that  AQk  —  QkRk  is  a.  partial  real  Schur  decomposition  of  order  k.  Expand 

(3.2.3)  f JZim)e1  =  QkXk  +  r , 

where  Q^r  =  0.  Note  that  r  =  (I  —  Q kQl)U z[m'> e\:  The  norm  of  r  measures  the 
sine  of  the  angle  between  the  TZ(Qk)  and  the  first  column  of  U Z\"'\  If  //fm+d  jg 
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unreduced,  then  Theorem  2.5  shows  that  r  approaches  zero  if  and  only  if  does. 

Theorem  3.5  gives  the  convergence  rate  of  to  zero  given  a  shifting  strategy. 

However,  the  shifting  strategy  has  the  effect  of  replacing  the  starting  vector — which 
re-starts  the  factorization  of  equation  (3.2.2).  The  IRA-iteration,  introduced  in  the 
next  chapter,  is  motivated  by  precisely  this  idea. 

3.3  Duality  of  the  QR  iteration  and  Krylov  Spaces 

The  following  theorem  establishes  a  fundamental  relationship  between  the  QR  algo¬ 
rithm  and  inverse  iteration. 

Theorem  3.6  Suppose  that  H  —  tI  6  Rnxn  is  a  nonsingular  Hessenberg 
matrix.  If  H  —  tI  —  QR  where  Q  €  RnXn  is  orthogonal  and  R.  6  R"Xn  is 
upper  triangular,  then 

(3.3.1)  (H  —  t  I)~T  =  QL, 

where  L  =  R~T . 

Proof  The  result  follows  easily  by  first  inverting  the  equation  H  -  tI  =  QR.  and 
then  taking  the  transpose  of  both  sides.  □ 

The  proof  of  the  theorem  shows  that  the  hypothesis  that  H  is  an  upper  Hessenberg 
matrix  may  be  removed.  The  only  crucial  hypothesis  is  that  of  nonsingularity. 
Post-multiplying  both  sides  of  equation  (3.3.1)  with  e„  results  in 

{H  -  rI)~T enpn  =  Qen 

where  pn  =  eJtR.e Apparently,  one  step  of  the  QR-iteration  amounts  to  inverse  itera¬ 
tion  with  (H  —  t I)~T  on  the  vector  en.  The  implication  is  that  while  the  QR-iteration 
builds  an  orthogonal  factorization  for  the  Krylov  subspace  K'.JA  —  tIJJQ eq)  it  is 
also  building  one  for  K.n((A  -  tI)~t ,UQ?n),  where  AU  =  UH  is  an  Hessenberg  de¬ 
composition.  We  call  the  latter  space  the  dual  Krylov  subspace  of  K.n(A  -  tI,  UQt-Q. 
This  duality  and  the  convergence  theory  developed  for  Krylov  subspaces  in  §  2.6  of 
Chapter  2,  helps  to  explain  why  the  Hessenberg  decomposition  helps  to  sort  the  spec¬ 
tral  information  of  A.  Indeed,  practical  shifting  strategies  for  the  QR  algorithm  use 
information  in  /C?((A  -  r/)_r,  lJQtn)  for  j  —  1,2. 
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3.4  The  Practical  QR  algorithm 

This  section  briefly  reviews  some  of  the  practical  issues  affecting  the  convergence  of 
Algorithm  3.1.  The  issues  considered  include: 

•  Deflation. 

•  Selection  of  shifts. 

•  The  implicitly  shifted  QR  iteration. 

•  Computing  eigenvectors  and  reordering  the  Schur  decomposition. 

Our  discussion  is  patterned  after  those  in  Demmel  [24],  Golub  and  Van  Loan  [35], 
and  Stewart  [86]. 

For  the  remainder  of  the  section  we  continue  to  assume  that  AU  =  UH  is  an 
Hessenberg  decomposition  of  A  and  that  H  is  an  unreduced  upper  Hessenberg  matrix. 


3.4.1  Deflation 


Suppose  that  after  m  steps  of  Algorithm  3.1  we  have 


//(”»+ 1) 


rr(m  +  l) 

11  \l 
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where  £  R'XJ  for  1  <  j  <  n.  If  e  is  suitably  small  we  may  set  it  to  zero — this 

is  called  deflation.  This  is  justified  since 


AUZ^  =  UZ(m) 
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and  setting  E  =  —  t(U Z^'^e.i)(U Zhnl eflT  it  follows  that 
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Since  \\E\\  =  e,  deflating  the  sub-diagonal  element  is  equivalent  to  computing  the 
eigenvalues  of  a  matrix  near  A.  After  deflation,  two  unreduced  Hessenberg  matrices 
remain.  Since  computing  the  eigenvalues  of  a.  matrix  determines  the  roots  of  its  char¬ 
acteristic  polynomial,  deflation  is  equivalent  to  factoring  the  characteristic  polynomial 
for  a  nearby  matrix. 
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A  criterion  used  by  both  EISPACK  [82]  and  LAPACK  [1]  is  to  check  il  | h 
rj(\a^^  \  +  |a;m+1)|)  for  2  <  j  <  n  where  //  is  the  machine  precision.  Since  afi+l^  < 
||  A ||  the  criterion  deflates  sub-diagonals  that  are  small  relative  to  the  matrix.  Every 
sub-diagonal  element  of  element  that  satisfies  the  above  inequality  is  set  to 

zero.  If  /3b'l+1)  is  negligible,  then  is  an  approximate  eigenvalue  and  we  continue 

the  QR-iteration  on  the  leading  sub-matrix  ol  ol  order  n  —  1.  Francis  [33] 

also  explains  how  deflation  may  be  performed  il  the  product  ol  two  consecutive  sub- 
diagonal  elements  is  small. 


3.4.2  Shift  selection 

Although  Theorem  3.5  indicates  the  convergence  expected  of  Algorithm  3.1  given  a 
set  of  shifts,  the  important  question  of  selecting  one  has  gone  unanswered.  From 
Lemma  3.1  we  expect  the  last  sub-diagonal  element  to  become  small  after  a  QR  step 
with  a  shift  close  to  an  eigenvalue  of  H.  According  to  the  results  ol  §  3.3,  the  lower 
right  hand  corner  of  f/h'd  contains  some  important  spectral  information.  Consider 
the  residual  of  using  (en,r)  as  an  approximation  to  an  eigenpair  of 


\\(H^-Tl)Te4  =  \\(H^)Ten-TtJ 


||(^n  T  fin&n—  1 1|  -h  \Pn 


The  Rayleigh  quotient  r  =  results  in  the  minimum  residual.  Since  the 

eigenvalues  of  (fj(m))T  are  the  same  as  those  of  the  previous  discussion  sug¬ 
gests  that  Algorithm  3.1  use  the  sequence  ol  Rayleigh  quotients  as  shifts. 

Assuming  the  hypothesis  of  Theorem  3.5  on  pn~i  are  met,  then  tends  toward 
zero.  In  fact,  a  straight  forward  calculation  shows  that  before  the  last  plane  rotation 
necessary  for  the  QR  factorization  of  //(m)  —  tI  we  have 
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where  the  are  plane  rotations.  After  completing  the  QR  step  it  follows  that 
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Once  <  1  then  fij7'L+1l  goes  to  zero  at  a  quadratic  rate.  In  particular  if  //fm)  jH 

symmetric,  then  7„  =  /th'd  and  the  rate  improves  to  a  cubic  one. 

A  shifting  strategy  due  to  Wilkinson  [101,  35]  has  generally  been  adopted  for  the 
practical  implementations  of  the  QR  algorithm  [1,  82].  Suppose  that  V\  and  v2  are  the 
eigenvalues  of  the  two  by  two  block  in  the  South-East  corner  of  H^m\  If  V\  and  v2 
are  both  real  numbers,  then  Wilkinson’s  shift  is  defined  to  be  the  value  of  n,-  closet  to 
cS™'* ■  Otherwise,  the  eigenvalues  of  this  two  by  two  block  form  a  complex  conjugate 
pair.  The  next  section  considers  an  efficient  manner  in  which  a  complex  conjugate 
pair  of  shifts  are  applied. 

3.4.3  The  Implicitly  Shifted  QR  iteration 

Theorem  2.1  (Implicit  Q)  of  Chapter  2  gives  conditions  under  which  the  Hessenberg 
decomposition  of  ( H  —  V\1)(H  —  u2I )  is  unique.  If  H  is  an  unreduced  Hessenberg 
matrix,  then  the  decomposition  ( H  —  viI)(H  —  v2I)  is  specified  by  the  first  column 
of  U .  Francis  [33]  also  observed  that 

(3.4.1)  (H  —  V\I){H  —  U2I)t\  =  i]i  fq  +  T]2C2  -f  73e3- 

From  Theorem  3.2  we  have 

ZWrM  =  QMqWrWrM  =  (H-  v\I){H  -  v2I)  =  H2  -  2Real {vx)H  +  H2/. 

This  implies  that  rj  1,7/2,  and  r/ 3  are  real  numbers  when  is  the  complex  conjugate 
ol  v2.  Thus,  in  theory,  two  consecutive  QR  steps  with  a  complex  conjugate  pair  of 
shifts  may  be  applied  in  real  arithmetic  by  computing  the  similarity  transformation 
//l2)  =  [Zi'2'>)T H But  there  is  a  more  efficient  manner  in  which  to  apply  a 
complex  conjugate  pair  of  shifts.  By  the  Implicit  Q  Theorem  Francis  concluded  that 
only  the  values  of  71,2,3  are  needed  since  only  they  are  used  when  computing  the  first 
column  of  a  QR  factorization  of  ( H  —  PiI)(H  —  v2I). 

Suppose  that  Wq  is  a,  Householder  reflector  that  transforms  the  vector  defined 
by  the  right  hand  side  of  equation  3.4.1  into  ||//iei  +  // 2e2  +  73e3||e!.  Computing 
Wq  HWq  has  the  unfortunate  side  affect  of  destroying  the  Hessenberg  structure  in 
the  leading  principal  sub-matrix  of  order  four.  The  implicitly  shifted  QR  iteration 
is  defined  by  computing  a  Householder  matrix  Wi  so  that  (Wo  •  •  •  Wi)T HWq  ■  ■  ■  Wt 
for  i  =  0, . . . ,  n  —  1  is  an  upper  Hessenberg  through  its  first  i  columns.  It  may  be 
easily  shown  [35]  that  Wtti  =  ex  for  1  <  i  <  n  -  1  and  so  W0  ■  ■  ■  Wn_ie:  -  HVq  = 
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||^i ei  +772^-2  +  //3 e.3 1 1 e  1 .  Hence,  as  long  as  H  is  an  unreduced  upper  Hessenberg  matrix, 
the  explicit  and  implicit  QR-iterations  are  the  same. 

Finally,  the  QR  algorithms  of  EISPACK  [82]  and  LAPACK  [1]  also  implicitly  apply 
Wilkinson’s  shift.  Implicitly  applying  a  shift  avoids  subtracting  the  shift  from  the 
diagonal  elements  of  H,  possibly  preventing  loss  of  information  due  to  cancelation. 
An  example  of  this  is  presented  in  §  5.3  of  Chapter  5.  We  also  remark  that  a  number  of 
shifts  may  be  implicitly  applied.  This  is  the  basis  for  the  multi-shift  QR-iteration  [8] 
of  Bai  and  Demmel.  Both  Dubrulle  [27]  and  Watkins  [96]  discuss  the  multi-shift 
algorithm  and  present  explanations  of  why  the  algorithm  performs  poorly  when  the 
number  of  shifts  applied  is  large. 


3.4.4  Computing  Eigenvectors  and  Reordering  the  Schur  Decomposition 


Suppose  that  A  £  RnXn  is  reduced  to  upper  quasi-triangular  form  by  the  QR  algo¬ 
rithm: 


(3.4.2) 


CfAQ 


/i’ll  />12 

0  Bn 


=  R,, 


where  Q  is  the  orthogonal  matrix  computed  by  the  algorithm.  Equation  (3.4.2)  is 
a  Schur  form  for  A  of  order  n  where  the  sub-matrices  Bu  and  Bn  are  of  order  k 
and  n  —  k,  respectively.  Assume  that  the  spectrums  of  R.u  and  Bn  are  distinct.  In 
practice,  the  order  in  which  the  computed  eigenvalues  of  A  appear  on  the  diagonal  of 
B  is  determined  by  Theorem  3.5. 

If  all  the  eigenvectors  of  A  are  required  an  upper  quasi-triangular  matrix  S  may 
be  computed  so  that  BS  —  SD  where  D  is  the  quasi-diagonal  portion  of  B.  It  follows 
that  AQS  =  QSD.  Further  details  are  considered  by  Demmel  [24]  and  Golub  and 
Van  Loan  [35]. 

In  many  situations,  only  a  small  number,  say  k,  eigenvectors  are  requested.  If 
the  corresponding  eigenvalues  are  found  in  7ru ,  then  the  first  k  columns  of  Q  are  an 
orthogonal  basis  subspace  corresponding  to  the  eigenvalues  of  i?.n.  The  eigenvectors 
are  easily  determined  by  computing  those  of  R.n.  Suppose  that  R.nS\  =  S\Di  is  a 
quasi-diagonal  form;  then  AQS  1  =  QS\D\. 

If  eigenvalues  of  interest  are  located  in  Bn  and  a  basis  for  the  associated  eigenspace 
is  wanted  then  we  must  either  increase  the  number  of  columns  of  Q  used  or  somehow 
place  them  at  the  top  of  B.  Algorithms  for  re-ordering  a  Schur  form  accomplish  this 
task  by  using  orthogonal  matrices  to  move  the  wanted  eigenvalues  to  the  top  of  B. 
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The  recent  work  of  Bai  and  Demmel  [9]  attempts  to  correct  the  occasional  numerical 
problems  encountered  by  Stewart’s  algorithm  [87]  EXCHNG.  Their  work  was  moti¬ 
vated  by  that  of  Rulie  [68]  and  that  of  Dongarra,  Hammarling,  and  Wilkinson  [25]. 
Both  algorithms  swap  consecutive  1x1  and  2x2  blocks  of  a.  qua, si-triangular  matrix 
to  attain  the  desired  ordering. 

Suppose  that  the  matrix  R  of  equation  (3.4.2)  is  of  order  two.  EXCHNG  constructs 
a  plane  rotation  that  zeros  the  second  component  of  the  eigenvector  corresponding 
to  the  eigenvalue  A2  =  f?.22.  A  similarity  transformation  is  performed  on  i?  with 
the  plane  rotation  and  the  diagonal  blocks  are  interchanged.  We  refer  to  a  strategy 
that  constructs  an  orthogonal  matrix  and  performs  a,  similarity  transformation  to 
interchange  the  eigenvalues  as  a  direct  swapping  algorithm. 

Consider  the  following  alternate  iterative  swapping  algorithm:  Perform  a  similar¬ 
ity  transformation  on  R  with  an  arbitrary  orthogonal  matrix  followed  by  one  step  of 
the  QR-iteration  with  shift  equal  to  A2.  The  arbitrary  orthogonal  similarity  trans¬ 
formation  introduces  a.  non-zero  off-diagonal  element  in  the  (2,1)  entry  so  that  the 
transformed  R  is  an  unreduced  upper  Hessenberg  matrix  with  the  diagonal  blocks 
coupled.  Lemma  3.1  implies  that  the  (2, 1)  entry  is  zeroed  since  an  eigenvalue  is  used 
as  a  shift  and  hence  Aj  and  A2  are  switched. 

If  the  order  of  i?22  is  equal  to  two,  EXCHNG  uses  the  iterative  swapping  strategy 
using  a  standard  double  shift  to  re-order  the  diagonal  blocks.  The  direct  swapping 
algorithm,  instead,  computes  an  appropriate  orthogonal  matrix  by  computing  the  QR 
factorization  of  a  basis  of  two  vectors  that  span  the  desired  invariant  subspace.  The 
reader  is  referred  to  [9,  25]  for  further  details. 

An  example  and  explanation  for  the  occasional  failure  ol  Stewart’s  algorithm  is 
considered  in  §  5.4  of  Chapter  5. 
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Chapter  4 


Re-starting  an  Arnoldi  Iteration 


The  previous  two  chapters  considered  in  detail  two  fundamental  algorithms  for  com¬ 
puting  approximations  to  the  eigenvalues  of  A.  The  Arnoldi/Lanczos  algorithms 
are  appropriate  when  the  matrix  A  is  so  large  that  storage  and  computational  re¬ 
quirements  prohibit  completing  anything  but  a  length  k  -C  n  factorization  with 
Algorithm  2.2.  If  only  a  small  subset  of  the  eigenvalues  are  desired,  the  length  k 
Arnoldi  factorization  may  suffice.  The  analysis  of  Chapter  2  indicates  dictates  that 
a  strategy  for  finding  k  eigenvalues  in  a  length  k  factorization  is  to  find  an  appro¬ 
priate  starting  vector  that  forces  to  vanish.  However,  working  in  finite  precision 
arithmetic  generally  removes  the  possibility  of  the  computed  residual  ever  vanishing 
exactly — even  if  a  length  n  factorization  is  built. 

The  QR  algorithm,  on  the  other  hand,  computes  an  approximation  to  a  real  Scliur 
decomposition  of  A.  All  the  eigenvalues  of  A  are  approximated  and  the  eigenvectors 
are  readily  available.  Theorem  3.4  of  Chapter  3  shows  the  relationship  between 
the  Arnoldi/Lanczos  and  QR  algorithms.  In  exact  arithmetic,  when  using  the  same 
starting  vector,  both  algorithms  generate  the  same  orthogonal  and  upper  Hessenberg 
matrices.  Forcing  the  residual  to  zero  for  the  Arnoldi/Lanczos  algorithms  has  the 
effect  of  deflating  a  sub-diagonal  element  during  the  QR  algorithm. 

The  idea  of  re-starting  the  Arnoldi  iteration  is  motivated  by  Theorems  2.1  and  2.5. 
Our  goal  will  be  to  construct  a  starting  vector  that  is  a  member  of  the  invariant 
subspace  of  interest.  Theorem  2.5  then  gives  that  the  residual  vector  associated  with 
the  truncated  Arnoldi  factorization  vanishes.  This  chapter  considers  two  re-starting 
variants.  The  first  variant,  introduced  by  Saacl  [74],  explicitly  re-starts  the  Arnoldi 
factorization  and  is  the  subject  of  §  4.1.  The  second  approach  is  to  implicitly  re¬ 
start  the  factorization.  This  IRA-iteration,  introduced  by  Sorensen  [83],  is  the  subject 
of  §  4.2.  A  numerical  example  is  presented  in  §  4.3  that  serves  to  illustrate  how 
both  variants  perform  in  practice.  The  important  subject  of  polynomial  iterations  or 
acceleration  methods  is  examined  in  §  4.4.  This  includes  the  polynomial  iterations  of 
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Saad  and  a  careful  look  at  the  IRA-iteration.  Finally,  §  4.4.3  examines  an  interesting 
explicitly  re-started  approach  recently  introduced  by  Scott  [80]. 


4.1  Explicitly  Re-starting  the  Arnoldi  Iteration 

Suppose  that  k  <C  n  eigenvalues  of  A  require  approximation.  As  explained  in  §  1.2.1 
of  Chapter  1,  the  k  eigenvalues  of  A  of  interest  are  called  the  wanted  ones.  The  ER  A- 
iteration  starts  by  building  a,n  Arnoldi  factorization  of  length  k  +  p  for  some  positive 
integer  p.  An  improved  starting  vector  is  then  obtained  by  using  a  specific  linear 
combination  of  the  columns  of  Vk+P  ■  The  linear  combination  is  determined  by  the 
spectral  information  of  Hk+P .  The  ERA-iteration  is  defined  by  repeating  the  above 
process.  Algorithm  4.1  outlines  the  procedure  followed  by  some  comments. 


Algorithm  4.1  (Explicitly  Re-started  Arnoldi  Iteration) 

Input:  An  unit  vector  nj1*. 

1.1  For  j  =  1,2,...  until  convergence 

2.1  Build  ail  Arnoldi  factorization  of  length  k  +  p  given  a  starting 


vector  v 


AVi 


(i) 


k+p 


r/(i)  ijti)  I  f(i)  T 
'  k+p12  k+p  '  J  k+jAk+p  i 

2.2  Compute  the  decomposition  : 

rjU)  o(i)  _  cU)  rtU) 

11k+p°k+P  ~  ‘-k+p1-' k+p 

where  ,  dIQ  )  is  a.  quasi-diagonal  form  for  ordered  so 

that  the  wanted  eigenvalues  are  in  leading  portion  of  D^+p  ; 

2.3  If  k  wanted  eigenvalues  of  have  converged  then 

exit  the  current  loop  ; 

2.4  Compute  the  unit  vector  :  v\:i 


■(j+1)  -  v^;,.s(-d 


where  <-  fx  S^lpex  + - f  Tk  Sk+Pek  and  ||«h)||  =  1. 

1.2  End  For 

1.3  If  desired,  compute  the  Ritz  vectors  x\^  =  s\^ 


where  H'k+psl 


(?)  Jj)  _  sC)^(i) 


We  briefly  address  the  issues  of  determining  convergence,  the  choice  of  p,  and  how 
the  coefficients  for  i  —  1, . . . ,  k  are  selected. 
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An  eigenvalue  of  H^lv  (or  equivalently,  a  Ritz  value  of  A)  is  converged  when  it 
satisfies  the  stopping  criterion  of  §  2.5  of  Chapter  2.  A  practical  implementation  of 
Algorithm  4.1  would  include  the  deflation  of  converged  Ritz  values  during  the  course 
of  the  iteration.  Chapter  6  discusses  deflation  rules  in  detail. 

The  choice  of  p  is  usually  a  tradeoff  between  the  length  of  a  factorization  that 
may  be  tolerated  and  the  rate  of  convergence.  From  the  results  on  the  convergence  of 
Krylov  spaces  in  §  2.6,  the  accuracy  of  the  Ritz  values  typically  increases  as  p  does. 
However,  for  increasing  p,  the  number  of  Arnoldi  vectors  stored  as  well  as  the  size 
of  the  Hessenberg  matrix  increases.  For  most  problems,  the  size  of  p  is  determined 
experimentally. 

The  selection  of  the  expansion  coefficients  is  the  most  unsettling  decision  that 
needs  to  be  made.  Saad  first  suggested  [74]  choosing  the  coefficients  so  that  the 
slowest  converging  Ritz  vectors  are  favored  the  most.  For  example,  let  7}^  be  the 
Ritz  estimate  for  the  <-th  wanted  Ritz  value  during  the  ji-tli  iteration  of  the  loop.  The 
jf*)  are  the  properly  normalized  that  result  in  the  unit  vector  The  use  of 
polynomial  filters  that  better  employ  the  spectral  information  of  H<fi+p  to  determine 
an  improved  starting  vector  is  addressed  in  §  4.4.  Since  the  new  starting  vector 
computed  at  line  2.4  is  a  linear  combination  of  the  columns  of  ,  there  is  a  unique 
vector  ck+p  G  Rk+P  such  that  the  relation  Kk+p(Apv[j))ck+p  =  w[7+1)  holds.  In  other- 
words,  the  new  starting  vector  is  determined  by  applying  a  polynomial  of  at  most 
degree  k  +  p  —  1  in  A  to  the  current  starting  one. 

Finally,  Saad  [78,  page  234]  suggests  using  a  deflated  algorithm  when  computing 
k  >  1  Ritz  values.  The  idea  is  to  compute  one  Ritz  value  and  approximate  Schur 
vector  at  a  time.  The  process  uses  an  ERA-iteration  to  compute  an  approximate 
Ritz  pair — taking  care  that  the  Arnoldi  vectors  are  orthogonalized  against  the  ap¬ 
proximate  Schur  vectors.  During  each  cycle  of  the  iteration,  the  approximate  Schur 
vectors  are  kept  in  the  leading  portion  of  V^p,  and  the  corresponding  part  of  Hk+p  is 
upper  quasi-triangular.  As  each  Ritz  value  converges,  the  corresponding  Ritz  vector 
is  orthogonalized  against  the  approximate  Schur  basis  to  obtain  another  approximate 
Schur  vector.  This  orthogonalization  procedure  is  further  discussed  in  §  6.5  of  chap¬ 
ter  5  within  the  context  of  deflation.  When  the  converged  portion  of  the  Arnoldi 
factorization  of  the  jf-th  cycle  of  the  iteration  contains  a  basis  for  an  approximate 
invariant  subspace  of  dimension  A,  the  deflated  algorithm  is  halted.  This  procedure, 
analogous  to  the  one  used  by  Scott  [80],  is  considered  in  more  detail  in  §  4.4.3. 
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4.2  The  Implicitly  Restarted  Arnoldi  Iteration 

The  IRA-iteration  is  motivated  by  re-starting  the  factorization  in  an  implicit  manner 
as  suggested  in  §  3.2.2  of  Chapter  3.  The  scheme  is  called  implicit  because  the  updat¬ 
ing  of  the  starting  vector  is  accomplished  with  an  implicitly  shifted  QR  mechanism  on 
Hk.  This  will  allows  us  to  update  the  starting  vector  by  working  with  orthonormal 
matrices  that  live  in  Hky'k  rather  than  in  RnXn. 

The  iteration  starts  by  extending  a.  length  k  Arnoldi  factorization  by  p  steps.  Next, 
p  shifted  QR  steps  are  performed  on  Hk+P-  The  last  p  columns  of  the  factorization  are 
discarded  resulting  in  a,  length  k  factorization.  The  iteration  is  defined  by  repeating 
the  above  process  until  convergence. 

As  an  example,  suppose  that  p  =  1  and  k  represents  the  dimension  of  the  desired 
invariant  subspace.  Let  p  be  a  real  shift  and  let  Hk+ 1  —  pi  =  QR  with  Q  orthogonal 
and  R,  upper  triangular  matrices,  respectively.  From  equation  (2.2.1)  of  Chapter  2, 

( A  —  pI)Vk+ 1  —  Vk+i(Hk+i  —  pi)  —  Jk+ lCk+i, 

(A  —  pI)Vk+i  —  Vk+iQR  —  fk+ if'fc+i, 
(A-p,I)(VH1Q)-(Vk+iQ)(R.Q)  =  Wr+i$, 

(4.2.1)  A(Vk+iQ)  —  (Vk+\Q)(RQ  +  pi)  =  Jk+i^+iQ- 


The  matrices  are  updated  via  <—  Vk+iQ  and  U^+1  *—  R.Q  +  pi  and  the  lat¬ 
ter  matrix  remains  upper  Hessenberg  since  R  is  upper  triangular  and  Q  is  upper 
Hessenberg.  However,  equation  (4.2.1)  is  not  quite  a  legitimate  Arnoldi  factorization. 
Equation  (4.2.1)  fails  to  be  an  Arnoldi  factorization  since  the  matrix  j):+i cJ+]  Q  has 
a  non-zero  k- th  column.  Partitioning  the  matrices  in  the  updated  equation  results  in 


(4.2.2) 


Ht 

Pk+leI  <*k+ 1 

+  fk+ 1  [  O-fc+iCfc  Ik  +  l  ]  5 


Jk+ 1 


vf 


k+ 1 


where  ak+ 1  =  ef+1  Qt:k  and  'jk+i  =  c1k+1Qek+1.  Equating  the  first  k  columns  of 
equation  (4.2.2)  gives 


(4.2.3)  AVk+  =  V?H+  +  (Pt+A+i  +  <rk+ifk+i)el. 


Performing  the  update  /i+_1  <—  (Ik  vk  +&kfk  and  noting  that  (Vkil)T fk_i  —  0  it  follows 
that  equation  (4.2.3)  is  a.  length  k  Arnoldi  factorization. 
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The  following  elementary  but  technical  result  shows  that  the  previous  idea  may 
be  extended  for  up  to  1  <  p  <  k  shifts  and  a.  new  length  k  Arnoldi  factorization 
remains.  A  similar  result  was  proved  by  Paige,  Parlett  and  Van  der  Vorst  in  Lemma 
1  of  [58]  for  the  Lanczos  factorization. 

Lemma  4.1  Let  AVk+p  =  Vk+pBk+p  +  fk+pej+p  be  a  length  k+p  Arnoldi 

factorization  where  Hk+p  is  an  unreduced  upper  Hessenberg  matrix.  If 

mx)  =  ri(A-/9-), 

i= 1 

then 

Vv  ( A )  Vk+p  =  lA+p  '4+  ( Hk+p ) 

(4.2.4)  +  iZfj+i  {A)fel^p-j(Hk+p), 

j=i 

where  ipj(\)  =  nLi(^  “  /'>)  and  'i^(X)  =  nLj(^  -  /'*)• 

Moreover, 

(4.2.5)  ip P(A)Vk  =  Vk+pipp(Hk+p)  ei  e2  •  •  •  ek  ]  • 

Proof  The  proof  is  by  mathematical  induction.  Define  m  =  k+p  and  the  subscripts 
are  suppressed  on  Vk+v  and  Hk+V  for  the  proof.  Since  ip\ [A)V  =  V'ip\ (H)+  feJH  where 
ipi(X)  —  X  —  pi,  the  base  case  for  p  —  1  is  established.  Assume  the  lemma’s  truth 
for  polynomials  ipj(X)  of  degree  j  <  p.  Let  ipp+ i(A)  =  (A  —  pv+i)ipv{ A).  Using  the 
induction  hypothesis,  it  follows  that 

ipP+i{A)V  =  {A  -  pp+lI)ipp(A)V 

=  (A-  /v+iC)  lvi>v(H)  + 

=  V(H  -  +  tfMU) 

+  (A-  t 

.7=1 
7'  +  l 

=  VMH)  +  •£ 


which  the  desired  result. 
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Since  H  is  unreduced  it  follows  that  =  0  for  i  +  p—1  >  j .  Moreover, 

the  last  matrix  on  the  right-hand  side  of  equation  (4.2.4)  is  zero  through  its  first  k 
columns,  equation  (4.2.5)  is  established.  □ 

Denote  the  QR  factorization  of  'fp(IIk+P)  =  Z^T^’f  Since  Hk+P  is  an  unreduced 
upper  Hessenberg  matrix,  eJi(>p(Hk+p)ej  =  0  for  i  +  p>  j  and  hence  Z ^  shares  this 
same  property.  Partitioning 

i>(Hk+P)  =  [zj">  Z<”> 

allows  us  to  rewrite  equation  (4.2.5)  as 

(4.2.6)  MWl.  =  VHrZ^Tl!\ 

In  words,  an  IRA-iteration  is  equivalent  to  performing  simultaneous  iteration  on  the 
matrix  14  while  working  only  with  matrices  of  order  k  +  p  !  The  column  space  of 
Vk+pZlp)  is  an  orthogonal  basis  for  'tJ’p(A)\ 4-  This  is  analogous  to  the  well  known 
connection  between  subspa.ce  and  QR-iterations.  Post-multiplication  of  an  Arnoldi 
factorization  of  length  k  +  p  with  Z^')  results  in 

AVk+pZ^  =  14 +?,Z^'>(^(''))r^+^(p)  +  A+f,e[+?>Z(p). 

Equating  the  first  k  columns  results  in 

AV+  =  VfHt  +  fpI. 

In  direct  analogy  with  the  single  shift  case,  the  updated  residual  is 

ft  =  fitVk+pZ^ek+1+a{p)fk+P, 

where  fit  =  ( ek+p)THk+p Z(p>  and  crjf*  =  ef+pZ^ek.  The  norm  of  ft  is  easily 
seen  to  be  \J  {fit)2  +  {^PYWfk+pW2  ■ 

Application  of  the  shifts  is  performed  implicitly  as  in  the  QR  algorithm.  If  the 
shifts  are  in  complex  conjugate  pairs,  the  implicit  double  shift  is  used  to  avoid  complex 
arithmetic.  The  following  procedure  outlines  the  scheme. 

Algorithm  4.2  (Implicitly  Re-started  Arnoldi  Iteration) 

Input:  A  length  k  Arnoldi  factorization  A Vk 1 '  =  +  ft^'k- 

1.1  For  j  =  1,2,...  until  convergence 


TiP)  Mr 

0  fj;A 
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2.1 

2.2 


2.3 


2.4 


Extend  the  length  k  Arnoldi  factorization  by  p  steps  : 

a  yd  _  T/b)  flti)  /(■»)  ,J 
J^’'k+p  *  k+j)  '  '  k-\-p  i  J  k+p^  k+p  ) 


If  k  wanted  eigenvalues  {0,-j)}*=1  of  H^p  have  converged  exit  the 
current  loop  ; 


Apply  p  implicitly  QR  steps  with  shifts  p\J\  . . . ,  p.^  to  H^p  to 
obtain  //g„Z«  =  ; 

Update  the  length  k  +  p  Arnoldi  factorization  of  Line  2.1  : 


2.5  Obtain  a  length  k  Arnoldi  factorization  by  retaining  only  the 
first  k  columns  of  the  factorization  in  Line  2.4  : 


AVu»)  =  + 

1.2  End  For 

1.3  If  desired,  compute  the  Ritz  vectors  x\^  = 
where 


One  cycle  of  the  iteration  is  illustrated  in  Figures  4.1 —  4.3.  Theorem  3.4  implies 
that  after  each  cycle  of  the  j  loop, 

•U"  =  U+'>«„ 

=  hiUw<b, 

=  (Theorem  3.4) 

(4.2.7)  =  V„(A)v\", 

where  -</Ad(A)  =  A  —  p\^)  •  •  •  (A  —  /tj;d)  with  rU)  a,  normalization  factor.  The 
results  of  Theorem  3.5  determine  the  rate  of  convergence  of  the  IRA-iteration  given  a 
set  of  shifts.  Recall  the  example  at  the  end  of  §  3.2  concerning  a  specific  choices  for  T 
and  Vm  used  by  Theorem  3.5.  The  example  implies  that  Algorithm  4.2  drives  to 
zero  if  the  discrete  min-max  problem  (3.2.1)  of  Chapter  3  is  solved.  If  the  sequence  of 
shifts  {/4  \  •  •  • ,  /4^}i= l  defining  the  polynomial  Vrp(X)  =  ippHty  '  ’  ‘  ^pK^)  is  a  good 
approximation  to  'I’(A)  =  (A  —  \k+i)  •  •  •  (A  —  An),  then  the  IRA-iteration  converges  to 
the  desired  invariant  subspace.  Theorem  3.5  implies  that,  the  magnitude  of  the  ratio 
of  'Jf(Afc)  to  ^(Afc+i)  gives  the  convergence  rate. 
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Figure  4.1  The  set  of  rectangles  represents  the  matrix  equation 
Vk+pHk+v  +  fk+pel+P  °f  an  Arnoldi  factorization.  The  unshaded  region  on  the 
right  is  a  zero  matrix  of  k  -f  p  —  1  columns. 


t+y 


Figure  4.2  After  performing  j>  implicitly  shifted  QR  steps  on  H^l  ,  the 


middle  set  of  pictures  illustrates  V^pZ^v\Z^)T +  fl+p^k+pZ^ .  The 


last  p  +  1  columns  of  fk+pej,  are  non-zero  because  of  the  QR-iteration. 


t 


Figure  4.3  After  discarding  the  last  p  columns,  the  final  set  represents 
Vk p+1)/f|J+1)  +  of  a  length  k  Arnoldi  factorization. 
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Numerous  choices  are  possible  for  the  selection  of  the  p  shifts.  One  immediate 
choice  is  to  use  the  p  unwanted  eigenvalues  of  H^p.  This  exact  shifting  scheme  and 
others  are  discussed  in  §  4.4  on  polynomial  iterations.  Exact  shifts  are  equivalent  to 
Rayleigh  quotients:  If  H$ps  =  sO  then  the  identity  (V^ps)T AV^ps  =  0  follows  from 
Arnoldi  factorization  of  length  k  +  />.  Unlike  the  QR-iteration,  the  IRA-iteration  or 
partial  QR-iteration,  does  not  have  access  to  the  spectral  information  necessary  for 
the  rapid  convergence  of  the  practical  QR  algorithm. 

As  for  the  ERA-iteration,  the  number  of  shifts  to  apply  at  each  cycle  of  the  above 
iteration  is  problem  dependent.  The  only  formal  requirement  is  that  1  <  p  <  n  —  k. 
However,  computational  experience  indicates  that  p  >  k  is  preferable.  Chapter  7 
discusses  the  many  tradeoff's  when  trying  to  select  the  size  of  p  relative  to  k. 

The  following  result  and  Lemma  4.1  were  communicated*  by  C.  A.  Beattie*. 

Theorem  4.3  Assume  the  same  hypothesis  of  Lemma  4.1.  Suppose  that 
Uli}iLi  is  a  set  of  shifts  such  that  if  //.;  has  a  nonzero  imaginary  part,  then 
fij  is  also  a  shift  for  some  i  ^  j.  If 

Vv(A)  =  II(A_/^)> 

»=i 

and  ipp(A)  is  non-singular,  then 

(4.2.8)  K.k{A,v+)L  =  ^,(AT)-1A:fc(A,u1)1, 

where  vf  =  V^+pZ^d  and  ipv(Hk+v)  =  Z(ir47'b4  [s  a  qr  factorization. 

Proof  Suppose  that  w  €  ipp(AT)~1K-k(A,v i)-1.  Then  w  =  ipp(A)~Ty  from  some 
vector  y  6  fCk(A,v i)1.  If  z  €  tl>p(A)]Ck(A,v i)  then  2  =  '^P(A)x.  for  some  vector 
x  €  lCk{A , na).  Thus, 

WTZ  =  yTi/>p(A)~1iJ>p(A)y  =  xTy  =  0, 
and  hence  w  G  {if>p(A)JCk{A,v i)}x  establishing 
(4.2.9)  t !>p(AT)-1!C.k(A,vl)±  C  {4h,{A)K.k{Apih)}L . 


’Workshop  on  Krylov  Subspace  Methods  and  Applications,  Raleigh,  NC,  March  17-18,  1995 
t Department  of  Mathematics,  Virginia  Polytechnic  Institute  and  State  University 
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Since  'ipp(A)  is  nonsingular  by  hypothesis,  it  follows  that  ij)p(AT)  1  exists  and  hence 

dim{V’p(AT)_1Kh(A,ni)-L}  =  dini{AiT(v4,  iq)1}, 

and 

divn{ij)p(A)K.k(A,vi)}  =  dim{AT(A,  ui)}. 

Along  with  equation  (4.2.9),  the  previous  relations  imply  that  {'tpp(A)JCk(A,  iq)}-1  = 
ipp(AT)-1K.k(A,v1)-L. 

By  the  second  conclusion  of  Lemma  4.1,  equation  (4.2.5), 

%j>p{A)K,k{A,vi)  =  ll{%l>p{A)Vk} 

=  H{Vu+pil>p{H)  [  ei  e2  ■■■  ek  ]} 

=  n{Vk+pZ<?)T<*)  [  Cl  e2  •••  e,  ]}, 

where  a  QR  factorization  of  'tl>p(Hk+p)  is  The  theorem  is  proved  since 

ipp(A)ICk(A,v1)  =  JCk(A,vt)  where  vf  =  \ 4+/,Z(p)fq.  □ 

Suppose  that  A  is  non-singular  and  that  the  grade  of  tq  is  n;  in  other  words,  the 
dimension  of  AC„(A,  iq)  is  n.  If  AVn  =  VnHn  is  an  Hessenberg  decomposition  with 
Vnei  ~  ui  then  ICn-k(A~T ,  Vne„)  =  K,k{A,  ui)1.  The  theorem  shows  that  analogous  to 
the  duality  of  the  QR-iteration  discussed  in  §  3.3,  during  each  cycle  of  an  IR  A-iteration, 
another  IRA-iteration  takes  place  on  the  Krylov  subspace  dual  to  IC-k(A,v i). 

4.3  Explicit  and  Implicit  Re-starting 

This  section  presents  a  striking  example  that  compares  the  ER  A-  and  IRA-iterations. 
Let  A  €  R10xl°  be  zero  everywhere  except  for  diagonal  elements 

«n  =  1,  «22  =  1,  a:i3  =  0,  tv44  =  0,  an  =  (i  -  I)  ■  10_1,  for  /  =  9, 

and  ones  on  the  sub- diagonal.  Suppose  that  the  vector  tq  is  used  to  start  both 
Algorithms  4.1  and  4.2  with  k  =  2  and  p  =  2  and  the  interest  is  to  compute  the  two 
eigenvalues  equal  to  one.  Using  an  exact  shift  strategy,  Algorithm  4.2  computes  the 
approximate  partial  real  Sc.hur  decomposition  AQt  =  Q2f?2  where 

.94919  .95789 

-2.6952  ■  10“3  1.0508 
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with  eigenvalues  equal  to  1  ±  il.  129168612228906  •  1(T8.  The  number  of  iterations 
needed  was  four  and  a  total  of  ten  matrix  vector  products  were  computed. 

But  Algorithm  4.1  stagnates.  In  fact,  the  same  information  was  computed  during 
every  cycle  of  the  iteration.  For  j  >  1, 


10  0  0 
110  0 
0  10  0 
0  0  10 


The  MATLAB  function  EIG  computes  the  two  eigenvectors 


.s 


T 

1 


3 


T 

2 


0  .57735  .57735  .57735  ] , 

1.7  -lO"18  -.57735  -.57735  -.57735 


corresponding  to  the  two  eigenvalues  equal  to  one.  II  the  expansion  coefficients  are 
chosen  equal  to  the  corresponding  normalized  Ritz  estimates,  the  vector  =  e,\  is 
computed  during  every  cycle  of  the  ERA-iteration. 

The  major  drawback  of  using  a  linear  combination  ol  the  eigenvectors  of  H^p  is 
that  they  may  form  a  poor  choice  for  the  starting  vector.  If  H^p  is  defective,  then 
there  might  not  be  enough  eigenvectors  corresponding  to  the  wanted  eigenvalues.  As 
the  previous  example  demonstrated,  computing  in  finite  precision  arithmetic  blurs 
this  sharp  characterization.  A  pair  of  approximate  eigenvectors  is  produced  that  are 
aligned  to  working  precision.  On  the  other  hand,  using  an  expansion  in  terms  ol  the 
Schur  vectors  of  H^p  is  a  better  behaved  numerical  process.  As  explained  in  §  4.4.2 
the  IRA-iteration  implicitly  uses  a  Schur  basis  ol  H^p.  Golub  and  Wilkinson  [38] 
examine  the  many  practical  difficulties  involved  when  computing  invariant  subspaces. 
As  the  above  example  shows,  computing  in  floating  point  arithmetic  generally  removes 
the  possibility  of  ever  detecting  a  defective  matrix. 

Among  the  several  advantages  an  implicit  updating  scheme  possess  over  an  explicit 
one  are: 


•  Only  p  matrix  vector  products  are  required  during  each  iteration  instead  ol 
k  +  p. 

•  Maintaining  a  prescribed  level  of  orthogonality  lor  only  p  additional  Arnoldi 
vectors  during  each  iteration  instead  of  k  +  p. 
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•  Re-starting  with  a  linear  combination  of  Schur  vectors  instead  of  eigenvectors. 

•  Ability  to  avoid  explicit  application  of  fi>(A). 

•  The  incorporation  of  the  well  understood  numerical  and  theoretical  behavior  of 
the  practical  QR  algorithm. 

The  last  point  was  first  mentioned  by  Sorensen  [83]:  This  thesis  makes  a  detailed 
study  of  the  relationship  with  the  QR  algorithm.  In  particular,  application  of  a  shift 
may  result  in  one  of  the  sub-diagonal  elements  of  becoming  small.  The  impact 
of  the  deflation  strategies  associated  with  the  QR-iteration  upon  the  IRA-iteration  is 
the  subject  of  chapter  6.  The  convergence  of  the  iteration  to  selected  portions  of 
the  spectrum  of  A  may  then  be  answered  by  appealing  to  the  theory  developed  in 
Chapter  3. 


4.4  Polynomial  Iterations 

As  explained  in  the  §  4.2,  each  iteration  of  Algorithms  4.1  and  4.2  implicitly  re¬ 
places  the  starting  vector  of  an  Arnoldi  factorization  with  i/)(A)v i  where  subscripts 
are  dropped  for  ease  of  notation.  If  A  is  diagonalizable  where  zj  for  j  =  1, . . . ,  n  are 
the  eigenvectors,  then  it  follows  that  iq  =  2i£i  +  •  •  •  +  zn (n  and  then 

(4.4.1)  '4){A)v\  =  +  •  ■  ■  +  znij)(  An)Cn- 

Assuming  that  the  eigenpairs  (zl}  A,)  are  ordered  so  that  the  k  wanted  ones  are  at  the 
beginning  of  the  expansion,  a  polynomial  of  degree  p  is  sought  so  that  the 

(4.4.2)  max  |^>(Aj)|  <  min  |^>(Aj)|. 

A  good  polynomial  A)  acts  as  a  filter.  Components  in  the  direction  of  unwanted 
eigenvectors  are  damped  or  equivalently,  components  in  the  direction  of  wanted  eigen¬ 
vectors  are  amplified.  We  remark  that  according  to  Theorem  3.5  the  convergence  of 
the  QR-iteration  is  also  dependent  upon  the  same  discrete  min-max  polynomial  ap¬ 
proximation  problem. 

It  should  be  emphasized  that  even  if  a  good  approximate  solution  is  computed  for 
the  discrete  min-max  problem  defined  by  equation  (4.4.2),  the  unwanted  products 
may  not  be  small.  This  can  only  happen  if  the  unwanted  coefficients  £ ,■  are 
large.  As  we  demonstrated  in  §  2.6  of  Chapter  2,  if  the  corresponding  eigenvalue 
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A;  is  poorly  conditioned,  then  Q  may  be  large.  The  conclusion  is  that  unwanted 
components  in  the  direction  of  an  eigenvector  corresponding  to  a  poorly  conditioned 
eigenvalue  may  not  be  expected  to  become  negligible.  In  addition,  if  A  is  defec¬ 
tive,  it  may  not  possess  enough  eigenvectors  corresponding  to  the  wanted  eigenvalues 
Ax,..., A*,.  Numerically,  problems  are  encountered  when  a  basis  for  the  desired  in¬ 
variant  subspace  is  poorly  conditioned. 

4.4.1  The  Polynomial  Iterations  of  Saad 

The  acceleration  techniques  and  hybrid  methods  presented  by  Saad  in  Chapter  seven 
of  [78]  are  motivated  by  attempting  compute  a  reasonable  solution  of  the  min-max 
problem  defined  by  equation  (4.4.2).  Saad  suggests  a  two  stage  process  for  calculating 
approximations  to  wanted  eigenvectors. 

First,  an  Arnoldi  factorization  of  length  k  +  p  is  built.  The  spectrum  of  the  upper 
Hessenberg  matrix  of  order  k  +  p  is  used  to  determine  a  polynomial  pm(A)  of  degree 
m.  Examples  include  using  the  Chebyshev  [76],  based  on  Manteuffel  [50]  scheme,  and 
least  squares  [77]  polynomials  introduced  by  Saad.  Second,  the  polynomial  pm(A ) 
of  degree  m  is  applied  to  a  linear  combination  of  the  wanted  eigenvectors  ol  the 
upper  Hessenberg  matrix  of  order  k  +  p.  The  resulting  vector  is  said  to  be  filtered. 
A  Ritz  vector  is  then  determined  using  the  the  filtered  one.  Within  the  context  of 
Algorithm  4.1,  the  filtered  starting  vector  is  just  another  choice  for  in  line  2.4. 
The  above  process  is  repeated  until  k  wanted  Ritz  values  converge.  As  mentioned 
at  the  end  of  §  4.1,  the  above  iterated  process  may  be  used  within  a  deflated  ERA 
algorithm. 

4.4.2  Implicit  Polynomial  Iterations 

The  IRA-iteration  implicitly  applies  a  polynomial  iteration  to  a  linear  combination  of 
Schur  vectors  spanning  a  wanted  eigenspace  of  Hk+P-  This  section  presents  several 
results  that  serve  to  motivate  the  the  exact  shifting  strategy  introduced  in  §  4.2.  The 
first  theorem  presented  is  a  generalization  of  Lemma  3.10  proved  by  Sorensen  [83]. 
The  major  difference  is  that  there  is  no  assumption  on  the  existence  ol  a  basis  of 
eigenvectors  for  the  desired  invariant  subspace.  Only  a  Schur  basis  is  used. 

Theorem  4.4  Suppose  H  €  RmXm  is  an  unreduced  upper  Hessenberg 
matrix  corresponding  to  a  length  m  Arnoldi  factorization  AV  —  VH+fe 
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and  that  the  eigenvalues  of  H  are  in  the  disjoint  partition 

{01, . . . ,  Ok]  U  {Ok+i,-  ■  ■ ,  Orn}- 

Assume  that  the  complex  conjugate  pairs  ot  eigenvalues  are  kept  together; 

Qi  —  Oj  implies  that  i,j  <  k  or  i,j  >  k. 

If  m  —  k  QR  steps  are  performed  with  the  shifts  0jt+i,  ■  •  •  ?  producing  an 
orthogonal  matrix  Q  €  RmXm  then 

(4.4.3)  (THQ  =  [  ^  , 

where  the  eigenvalues  of  H2 2  are  0jt+i 

Moreover,  the  updated  starting  vector  produced  by  Algorithm  4.2,  given 
an  exact  shifting  strategy,  is 

(4.4.4)  VQex  €  H{VQXZX), 

where  Q\  =  Q[e i,...,e/t]  and  H\\Z\  =  ZiTi  is  a.  partial  real  Schur  de¬ 
composition  and 

(4.4.5)  A(VQi)  =  (VQi)//’n  +  (e^Qie*;)/e^, 

is  the  updated  Arnoldi  factorization  of  length  k. 

Proof  The  matrix  equation  (4.4.3)  is  a  direct  result  of  Theorem  3.5. 

Partition  Q  =  [  Q\  Q2  ]  where  HQ\  =  QXH\\.  Let  II\\Z\  =  Z\T\  be  a  real 
Schur  decomposition  and  it  follows  that 

VQe  1  =  VQ\Ci  =  VQ\Z\Z^  ei  =  VQiZifj , 

where  y  =  Z^t\.  Partition  the  updated  length  m  Arnoldi  factorization  of  the  hy¬ 
pothesis  as 

(4.4.6)  AV  [  Q,  Q2  ]  =  V[QX  Q2]  ^  ^  +  /<&  [  Qi  Q2  }  ■ 

Equating  the  first  k  columns  of  equation  (4.4.6)  results  in 

A(VQi)  =  (VQ^Hn  +  ielQie^fel 


49 


since,  by  construction,  |  Q1  Q2  J  ,  is  zero  below  that  (m  —  k)- th  sub-diagonal.  □ 
Using  the  exact  shifting  strategy  during  the  iRA-iteration,  replaces  the  starting 
vector  with  a  linear  combination  of  the  wanted  approximate  Schur  vectors.  The 
ERA-iteration  also  has  the  same  goal,  but  the  IRA-iteration  performs  this  replacement 
implicitly  in  a  stable  fashion  using  a  Schur  basis  ot  H.  In  addition,  the  IRA-iteration 
avoids  the  need  to  re-start  the  next  factorization  from  scratch.  Note  that  as  m  — »  n, 
Theorem  3.5  implies  that  the  exact  shifting  strategy  places  improving  approximations 


of  the  wanted  eigenvalues  in  II u  in  a  stable  manner. 

The  restriction  that  keeps  the  complex  conjugate  pairs  of  eigenvalues  together  is 
only  needed  so  that  the  iteration  may  be  done  in  real  arithmetic.  The  hypothesis  of 
Theorem  4.4  concerning  the  disjoint  partition  of  the  eigenvalues  of  Ii  may  be  removed. 
A  result  by  Miminis  and  Paige  [53,  pages  391-395],  briefly  mentioned  in  §  3.1,  makes 
this  hypothesis  superfluous.  They  prove  that  if  m  -  k  QR  steps  are  performed  then 
the  matrix  equation  (4.4.3)  results  if  and  only  if  the  rri  —  k  shifts  are  eigenvalues  of 
H,  regardless  of  their  multiplicity. 

Algorithm  4.2  with  the  exact  shift  strategy,  builds  an  orthogonal  basis  for  a  num¬ 
ber  of  Krylov  subspaces  simultaneously.  The  following  is  a  slight  generalization  of 
Theorem  3  proved  by  Morgan  [54] . 


Theorem  4.5  Assume  the  same  hypothesis  and  notation  as  Theorem  4.4 
with  the  additional  hypothesis  that  J  ^  0.  Suppose  that  m  —  k  QR  steps 
are  performed  with  the  shifts  0k+i,  ■  ■  •  ,#m-  Let  M  be  a  positive  integer 
less  than  or  equal  to  n  —  m  and  greater  than  k.  If 

AV+  =  V+H+  +  f+eTM , 

is  the  length  M  Arnoldi  factorization  that  results  from  extending  the 
compressed  factorization  of  equation  (4.4.5)  and  II +  is  unreduced  then 

(4.4.7)  K(V+)  =  Span {VQU  Azh  . . . ,  AM~kZj} 

holds  for  each  Ritz  vector  z:i  =  Vs3  such  that  Hsj  =  SjOj  for  j  =  1 , ,k. 
In  particular,  if  the  eigenvectors  .sx, . . .  ,sk  of  H  are  linearly  independent, 
then 


(4.4.8)  IZ(V+)  =  Spa n{z1,...,zk,Az.i,...,AM  kZi}. 
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Proof  Partition  the  eigenvalues  of  H  as  in  the  hypothesis  of  Theorem  4.4.  Let 
( Sj,6j )  be  an  eigenpair  for  H  where  ||.Sj||  =  1  and  set  Zj  =  Vs.j.  Define  um+i  =  //||/|| 
and  U+e,-  =  vf  for  j  —  1, . . . ,  M.  Note,  that  by  Theorem  2.4  of  Chapter  2  it  follows 
that  vk+l  =  for  some  polynomial  i/>jt(A)  of  degree  k. 

Note  that  by  equation  (4.4.5)  of  Theorem  4.4,  we  have  vk+l  =  vm+\.  It  also  follows 
that  A{v£+1  -  A'xjjk{A)v^  €  K.k+i+1(A,vf)  for  i  =  1,...,M  -  k  -  1  which  implies 
that 

(4.4.9)  Span  {VQx,vl+1, . . . ,  AM~k~lvt+,}  C  77{U+}- 

We  now  show  that  these  two  sets  share  the  same  dimension.  Suppose  that  VQ\ij\  + 
I<M-k{A,  vk+l)y2  -  0  for  some  y  =  [yYviY  €  RM .  Thus,  there  exists  a  polynomial 
ip(X)  of  degree  less  than  M  so  that  ip(A)vf  =  0.  However,  since  H+  is  unreduced 
the  grade  of  vf  is  at  least  M  and  hence  y  =  0  which  implies  that  the  two  sets  in 
equation  (4.4.9)  are  equal. 

Using  mathematical  induction  we  show  that 

Alz}  €  Span{zj,  v£+1, . . . ,  A,_V+ih 

for  *  =  1, . . . ,  M  —  k.  From  the  length  rn  Arnoldi  factorization,  it  follows  that 
Azj  =  zflj  +  f{eTmSj)  =  ZjOj  +  vm+i(e£si)ll/ll  €  Span{^-,^+1}, 

establishing  the  base  case.  Suppose  that  the  result  is  true  for  positive  integers  i  —  1. 
The  inductive  hypothesis  implies  that 

A'zj  €  AA^zj 

G  ASpanjzj,^!, . . . ,  Al~2vt+X} 

€  Spa.n{Zj,'^+1,.. . ,  A,_1i^+1}, 

and  the  desired  resrdt  follows.  Now,  since  z3  6  7Z{VQ i}  and  vf+1  =  '<pk(A)vx  it 
follows  from  the  established  equality  of  the  two  sets  in  equation  (4.4.9)  that 

(4.4.10)  Span {VQU  Azh . . . ,  AM~kz3}  C  77{U+). 

Using  a  similar  argument  as  the  one  that  followed  equation  (4.4.9),  the  two  sets  in 
equation  (4.4.10)  are  equal.  The  first  conclusion  of  the  theorem  in  ecpiation  (4.4.7)  is 
proved  and  the  second  one  in  equation  (4.4.8)  easily  follows  when  the  eigenvectors  of 
H  are  linearly  independent.  □ 
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The  Krylov  subspace  of  length  k+p  generated  during  cycle  of  Algorithm  4.2  using 
exact  shifts  contains  all  the  Krylov  subspaces  of  dimension  p  +  1  generated  from  a 
wanted  Ritz  vector: 


c,,+1(A  4’1)  c 


O+i), 


corresponding  to  the  i  wanted  Ritz  values  +  V  .  . . ,  Morgan  infers  that  the  method 
builds  an  orthogonal  basis  for  a  Krylov  subspace  without  favoring  any  particular  Ritz 
vector. 


The  next  result  shows  that  the  polynomial  implicitly  applied  by  an  IRA-iteration 
using  exact  shifts  is  of  minimal  degree  when  we  wish  to  re-start  an  Arnoldi  factor¬ 
ization  with  a  vector  that  is  a  linear  combination  wanted  spectral  information  of 
HU) 

11 k+p- 


Theorem  4.6  Assume  the  same  hypothesis  of  Theorem  4.4  with  the 
addition  that  the  eigenvalues  of  H  are  distinct.  Let 

Til 

m  =  n  (i  - »,) 

j=k+ 1 

and  denote  the  Ritz  vectors  by  Zj  =  Vsj  where  Hsj  =  SjOj.  If  vx  G 
Span(zi, . . . ,  Zk)  then  for  some  polynomial  (f){\ )  of  degree  not  exceeding 
m  —  1 


vx  =  <f>(A)v  i, 

where  (f>( A)  =  i/’(A)x(A)  for  some  polynomial  y(A)  of  degree  at  most  k  —  1. 

Proof  Let  Zj  €  ICm(A,  Uj).  Then,  for  every  j,  there  is  polynomial  p:i ( A )  of  degree 
not  exceeding  m  —  1  such  that  pj(A)v x.  Thus  vx  =  (f>(A)v x  where  the  degree  of  <^>(A) 
does  not  exceed  m  —  1.  Using  Lemma  4.1  it  follows  that  vx  =  cj>[A)vx  =  cj)(A)V e x  = 
V (j){H)e  1.  Expand  ex  =  Si£i  +  •  •  •  +  s1n(m  and  hence  <f>(H)ex  =  sx(j){Ox)(x  +  •••  + 
Since  uj  G  Span(zi, . . . ,  zk)  it  follows  that  =  0  for  j  =  k  +  l,...,n. 

Denote  the  left  eigenvectors  of  H  by  iij  indexed  so  that  H  =  u^Oj.  Since  the 
eigenvalues  of  H  are  distinct,  the  biorthogonality  of  the  left  and  right  eigenvectors  of 
H  gives  that  ufe i  =  ufsj^j  and  ufsj  ^  0  for  ]  =  1 , pm.  Lemma  2.1  of  Chapter  2 
implies  that  ufe x  ^  0  and  hence  ^  0  and  so  4>(0j)  =  0  for  j  =  1, . . . ,  m.  Thus  ip(\) 
must  be  a  divisor  of  <j)( A)  and  the  theorem  is  proved.  □ 
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Theorem  4.4  implies  that  an  IRA-iteration  using  the  exact  shift  strategy  builds  a 
new  Arnoldi  factorization  using  only  the  wanted  spectral  information  from  a  previous 
Arnoldi  factorization.  Theorem  4.6  states  that  any  other  re-started  scheme  that  uses 
spectral  information  from  an  Arnoldi  factorization  introduces  unwanted  components 
if  the  degree  of  the  polynomial  is  greater  than  m  —  k.  From  equation  (4.4.5),  an 
IRA-iteration  with  the  exact  shift  strategy  uses  a  linear  combination  of  the  first  m  —  k 
columns  of  V. 

Further  research  on  alternate  shifting  strategies  is  needed.  In  particular,  the  im¬ 
plicit  application  of  the  Chebyshev  and  Least  squares  filtering  techniques  of  Saad  [76, 
77]  should  be  investigated.  Calvetti,  Reichel,  and  Sorensen  [17]  have  examined  the 
use  of  Leja  points  during  an  implicitly  re-started  La.nc.zos  iteration. 


4.4.3  Explicitly  Re-starting  with  Schur  Vectors 


Scott  [80]  presents  an  interesting  version  of  a  deflated  ERA-iteration  discussed  at  the 
end  of  §  4.1.  Suppose  the  first  /  —  1  columns  of  an  Arnoldi  factorization  are  ap¬ 
proximate  Schur  vectors  that  satisfy  the  convergence  criterion.  At  every  cycle  of  the 
iteration,  an  Arnoldi  factorization  of  length  k  +  p  —  l  +  1  is  built  where  the  l  —  1  ap¬ 
proximate  Schur  vectors  occupy  the  leading  portion  of  the  factorization.  For  example, 


consider  the  j-th  cycle  of  the  iteration  where  =  V]_ 


yti) 

Vk+p~I+ 1 


and  H$p  = 


are  generated.  The  leading  portion  of  both  and  de¬ 
fine  an  approximate  partial  real  Scliur  decomposition  of  A.  Let  = 

■^i+p-l+i  ^k+p-i+i  be  a  real  Schur  decomposition  ordered  so  that  the  wanted  eigenval¬ 
ues  are  in  leading  portion  of  Tjf+p_l+l.  When  the  first  column  of  V^p_l+1  Z^+p-f+i 
satisfies  the  convergence  criterion,  it  is  accepted  as  the  /-th  approximate  Schur  vector. 

Scott’s  version  differs  from  Algorithm  4.7  given  below  at  Line  3.2.  The  real  Schur 
decomposition  computed  by  Scott  is  ordered  with  the  eigenvalues  in  descending  order 
of  magnitude  along  the  diagonal  blocks  of  T^p_l+1 .  Scott  also  provides  what  appears 
to  be  robust  implementions  of  almost  all  the  ma  jor  explicitly  re-started  Arnoldi  vari¬ 
ants;  Block,  Chebyshev  acceleration,  and  pre-conditioned  Arnoldi.  The  residting 
software  is  available  as  the  code  EB12  in  the  Harwell  Subroutine  Library  [2],  The 
following  procedure  summarizes  Scott’s  re-started  single  vector  approach. 


Ti-x 

0 


ttU) 

nk+p-l+ 1 


Algorithm  4.7 

Input:  An  unit  vector  wj1*. 
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1 . 1  For  Z  =  1 , . . . ,  k 

2.1  For  j  =  1,2,...  until  convergence 

3.1  Build  an  Arnoldi  factorization  of  length  k  +  p  —  Z  +  1  given 
a  starting  vector  vf^  in  the  Z-th  column  of  Vk+p  '■ 


(i)  _  T/(t)  irU) 


AVI”  =  Vi 


fU)  ,,T 


' k+p  v k+p ^k+p  "I”  f  k+p^k+p  i 

3.2  Compute  the  real  Schur  decomposition  : 


H 


U) 


Ki) 


=  zfj) 


X 


O') 


k+p— /+1  ^k+p-l+l  ^k+p-l+l-1  k+p-l+1 

ordered  so  that  the  wanted  eigenvalues  are  in  leading  portion 

nf 

OI  1  k+p-l+1  > 


3.3  Set 


7O) 

Lk+V 


<t>U) 

1  k+p 


h- 1 

0 

0)  \T 


7O) 

^  k+p-l+1 

rO)  7O) 


3.4  Update  the  length  k  +  p  Arnoldi  factorization  of  Line  3.1  : 

4  t/0)  7O)  _  i/O)  7O)  mO+i)  1  rO)  r  7O)  . 

^^k+p^k+p  V k+p  ^  k+p1  k+p  '  J  k+jA  k+p  ^  k+p  i 

3.5  Obtain  a  length  Z  Arnoldi  factorization  by  retaining  only  the 
first  Z  columns  of  the  factorization  in  Line  3.4  : 

AV^l]  =  Vr/(i+1)tf,(j+1)  +  fij+1)ef  ; 

3.6  If  the  Z-th  column  of  V^7+1)  converges  as  a  Ritz  vector,  in¬ 
crease  Z  by  one  and  go  to  Line  2.3. 

2.2  End  For 

2.3  If  Z  =  k  then  stop. 

1.2  End  For 


During  each  cycle  of  the  iteration  in  Algorithm  4.7,  the  Arnoldi  factorization 
is  re-started  with  V^p-i+i^k+p-i+i^i  while  maintaining  orthogonality  against  the 
approximate  Schur  vectors  already  computed.  Equating  the  last  k+p  — l  +  l  columns 
of  the  length  k  +  p  Arnoldi  factorization  of  Line  3.1  results  in 

(4.4.11)  AV^p-i+i  —  +  Vpl\>_l+lHlQ1jJI_l+l  +  fi+p('l+p_l+1. 

Using  the  orthogonality  of  the  columns  of  V^p  gives  that  M/_i  =  V^_1AV^p_l+l  and 
hence 

(4.4.12)  (I  -  V^V^AVl+w  =  V<+wHjflr_,+,  + +lJ,eltp_w. 
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This  prompts  Saad  [78,  page  182]  to  make  the  observation  that  the  Hessenberg  matrix 
Hkl  -f+i  of  the  deflated  Arnoldi  factorization  of  equation  (4.4.11)  appears  at  the  front 
of  the  Arnoldi  factorization  applied  to  (7  —  Vi-\V^_X)A.  Thus, 

Vk+p-i+i Zk+p-i+i Ci  =  Kk+P-i+i({I  —  h/-i V]_1)A, v[  ))ck+p-i+i, 

where  v{j)  =  =  V$peh  a  polynomial  of  degree  at  most  k  +  p  -  l  in 

(7  —  V/_iV)Ia)A  is  applied  to  the  starting  vector  V^e/. 

In  theory,  there  is  no  difference  between  explicitly  and  implicitly  re-starting  an 
Arnoldi  iteration.  However,  the  numerical  behavior  ol  mathematically  equivalent 
schemes  may  quite  different.  An  example  of  this  was  given  in  §  4.3  comparing  the  ERA- 
and  IRA-iterations.  A  more  comprehensive  numerical  study  comparing  Algorithm  4.7 
and  Algorithm  4.2  is  planned  [48].  Another  alternative  is  the  work  ol  Baglama, 
Calvetti  and  Reichel  [4],  They  discuss  a  deflated  implicitly  re-started  Lanczos  itera¬ 
tion  using  Leja  shifts. 
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Chapter  5 


Numerical  Stability  of  an  IRA-iteration 


This  chapter  examines  the  particulars  of  computing  an  IRA-iteration  in  finite  precision 
arithmetic.  The  underlying  theme  of  this  thesis  is  that  QR-  and  IRA-iterations  are 
one  and  the  same.  The  chapter  discusses  the  numerical  stability  of  an  IRA-iteration 
by  appealing  to  that  of  the  QR-iteration. 

The  concepts  of  the  backward  and  forward  stability  of  the  QR  algorithm  are  ex¬ 
plained  in  §  5.1.  The  relevant  perturbation  theory  associated  with  matrix  eigenvalue 
problem  is  the  subject  of  §  5.2.  The  forward  instability  of  the  QR  algorithm  is  taken 
up  in  §  5.3.  A  connection  is  made  with  the  algorithms  used  to  re-order  the  Schur  form 
of  a  matrix  in  §  5.4.  The  final  section  of  the  chapter  presents  a  sensitivity  analysis  of 
orthogonal  reductions  of  a  matrix  to  upper  Hessenberg  form. 

5.1  Backward  and  Forward  Stability  of  the  QR  Algorithm 

Robust  implementations  of  a,  practical  QR  algorithm,  such  as  those  found  in  the 
software  packages  EISPACK  [82]  and  LAPACK  [1],  compute  a  real  Schur  form  for  a 
matrix  A  €  R"x"  such  that 

(5.1.1)  (A  +  E)Qh  =  QbR, 

where  QjQi  =  I  is  exactly  orthogonal  and  j|£j|  ~  cm||A||.  The  machine  precision  is 
denoted  by  cm-  The  upper  quasi-triangular  matrix  R  is  that  computed  by  a  robust 
implementation  of  the  QR  algorithm.  The  computed  orthogonal  matrix  Q  satisfies 
HQ1 Q  —  7||  ~  cm-  In  other  words,  the  real  Schur  form  of  a  matrix  near  A  is  computed. 
This  is  what  makes  the  QR  algorithm  backward  stable. 

Suppose  that  the  same  algorithm  is  computed  in  exact  arithmetic.  Let  AQ  =  QR 
denote  this  ideal  computation.  Assume  that  the  ordering  of  the  eigenvalues  on  the 
diagonal  of  R  and  R  is  the  same.  We  emphasize  that  it  does  not  follow  that 


\\R-R\\  «  eM||A||. 
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Indeed,  the  diagonal  elements  of  II  and  R  may  have  few  if  any  digits  of  agreement. 
If,  on  the  other  hand,  the  ratio  of  the  above  norm  difference  and  the  norm  of  A  is  on 
the  order  of  machine  precision,  then  the  QR  algorithm  is  forward  stable. 

In  particular,  consider  one  step  of  the  shifted  QR-iteration.  Suppose  H  is  an  unre¬ 
duced  upper  Hessenberg  matrix.  As  discussed  above,  the  computed  output  results 
in 

(H  +  E)Q  w  QH+ , 

where  —  L||  ~  eM  and  ||jE||  ~  ejvf||i/||.  Let  HQ  =  QH+  be  the  exact  QR-step 

computed  in  exact  arithmetic.  Is  it  reasonable  to  expect  that 

\\H+-H+\\  <  tM\\H+\\‘> 

As  we  shall  see,  the  shifted  QR  algorithm  may  be  very  sensitive  to  shift.  Equivalently, 
orthogonal  reductions  to  upper  Hessenberg  form  may  be  very  sensitive  to  tiny  per¬ 
turbations  in  the  starting  vector. 

5.2  Perturbation  Theory 

This  section  briefly  addresses  the  question  that  a  perturbation  theory  answers:  How 
does  an  eigenvalue  and  eigenvector  change  subject  to  changes  in  the  matrix  ?  An 
understanding  of  these  issues  is  important  since  it  helps  us  determine  the  accuracy 
of  the  eigenvalue  approximations  computed. 

The  analysis  of  §  2.5  of  Chapter  2  shows  that  when  the  product  of  the  last 
component  of  a  normalized  eigenvector  for  Hk  and  the  norm  of  \\fk\\  is  suitably 
small,  the  IRA-iteration  ha.s  computed  an  approximate  eigenpair.  If  Hks  =  sB  then 
(A  +  E)xr  =  xrB  with  E  =  -(els)fkx?.  It  follows  that  \\E\\  =  \ejs\l3m+i,  the  size 
of  the  backward  error,  bounds  the  distance  to  the  nearest  matrix  that  has  the  Ritz 
pair  ( xr,0 )  as  an  eigenpair.  The  following  theorem  indicates  what  accuracy  might  be 
expected  to  an  eigenvalue  of  A. 

Theorem  5.1  Suppose  that  A  is  an  eigenvalue  of  A  nearest  the  eigen¬ 
value  B  of  A  +  E.  Denote  the  left  and  right  eigenvectors  for  A  by  y  and  x, 
respectively,  each  of  unit  length.  Then 

+  0(||£||2) 

hr  A 


|A-0|  < 
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Proof  See  page  68  of  Wilkinson  [101].  □ 

The  secant  of  the  angle  between  x  and  ?/,  the  reciprocal  of  \yHx\,  determines  the 
conditioning  of  A.  If  the  left  and  right  eigenvectors  are  nearly  orthogonal,  then  even 
if  ||£||  eM||A||,  where  cm  is  machine  precision,  0  may  contain  lew  digits,  if  any,  ol 

accuracy.  Note  that  if  A  is  symmetric,  then  x  -  y  and  0  is  an  excellent  approximation 
to  A. 

The  question  of  how  close  the  Ritz  vector  xr  is  to  x  is  complicated  by  the  fact  that 
an  eigenvector  is  not  an  unique  quantity.  Any  scaling  ol  an  eigenvector  by  a  complex 
number  of  unit  modulus  remains  one. 


Theorem  5.2  Suppose  that  AQ  =  Q 


A  rT 

12  is  a  Schur  form  for  A 

0  Bn 


and  let  A  be  the  eigenvalue  ol  A  nearest  the  eigenvalue  0  ol  A  +  E.  II  tp 
measures  the  positive  angle  between  x  and  x>  then 

/  2\\E\\  i  evil  zp||2  \ 


<p  — 


sep(A,  Bn) 


+  0(\\E\\f)- 


Proof  See  Lemma  7.8  of  Demmel  [23].  D 

Varah  [94]  shows  that 

sep(A,i?,22)  <  min|A-At|, 

AilF  A 

(5.2.1)  sep(A, Bn)  <  ||rnll  r^~.  I 

V1  -  \yHA2 

where  the  latter  bound  is  only  defined  for  nonzero  r12.  Thus,  the  conditioning  of  the 
eigenvector  problem  depends  upon  both  the  distance  to  the  other  eigenvalues  ol  A 
and  the  sensitivity  of  A.  Varah  also  notes  that  both  upper  bounds  may  be  significant 
over  estimates.  Note  that  when  A  is  symmetric,  ri2  =  0  and  it  may  be  shown  that 
the  first  bound  is  an  equality.  The  conclusion  we  must  draw  is  that  the  computation 
of  the  eigenvalues  for  a  nonsymmetric  matrix  is  potentially  an  ill  conditioned  process. 

Multiple  and  clusters  of  eigenvalues  cause  further  complications  and  the  answer 
is  to  study  the  conditioning  of  invariant  subspaces.  In  fact,  il  the  angle  between 
the  left  and  right  eigenvector  approaches  ninety  degrees,  then  equation  (5.2.1)  im¬ 
plies  that  A  is  not  a  distinct  eigenvalue  of  A.  The  same  result  is  essentially  proved 
by  Wilkinson  [102].  He  shows  that  if  A  is  a  distinct  eigenvalue  of  A,  then  a  per¬ 
turbation  matrix  F  exists  so  that  A  is  a  repeated  eigenvalue  of  A  +  F  and  ||F||  < 
\yHx\l{Jl  -  \yH,A2)-  \yHA  is  equal  to  zero  then  A  is  a  repeated  eigenvalue  of  A. 
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Saad  [78]  presents  an  excellent  comprehensive  introduction  to  perturbation  theory 
within  the  context  of  large  scale  eigenvalue  problems.  The  works  ol  Chatelin  [18], 
Stewart  and  Sun  [90]  are  sources  for  more  general  study  with  many  citations  to  the 
literature.  In  particular,  the  work  of  Bai,  Demmel,  and  McKinney  [7]  examines  the 
construction  of  the  LAPACK  software  used  to  estimate  the  various  condition  numbers. 

Finally,  the  possible  ill  conditioning  of  the  nonsymmetric  eigenvalue  problem  leads 
Toll  and  Trefethen  to  suggest  that  the  Arnoldi  iteration  be  used  to  estimate  the 
pseudospectra  of  a  matrix  [93].  The  eigenvalues  of  A-\-  E  where  ||£'||  <  t  are  members 
of  A’s  pseudospectra. 

5.3  Forward  Instability  of  the  QR  Algorithm 

This  section  investigates  how  the  theory  ol  Chapter  2  behaves  when  computing  in 
floating  point  arithmetic.  By  understanding  what  causes  the  forward  instability  ol 
the  QR  algorithm,  we  may  possibly  prevent  its  deleterious  effects.  These  include: 

•  introducing  perturbations  that  lead  to  unnecessary  loss  ol  accuracy  in  the  com¬ 
puted  spectral  information. 

•  Increasing  the  number  of  iterations  required  lor  convergence. 

Since  the  last  two  chapters  demonstrate  that  the  IRA-iteration  is  equivalent  to  the 
QR-iteration,  we  are  directly  led  to  an  understanding  of  the  effect  applying  shifts 
during  a  cycle  of  Algorithm  4.2. 

Parlett  and  Le  [63]  carefully  examine  the  forward  instability  of  the  QR  algorithm 
on  symmetric  tridiagonal  matrices.  However,  we  shall  see  that  their  results  appear  to 
carry  over  directly  to  the  QR  algorithm  on  upper  Hessenberg  matrices.  The  analysis 
and  numerical  experiments  suggest  a  sensitivity  analysis  for  the  orthogonal  reduction 
of  a  matrix  to  upper  Hessenberg  form. 

Suppose,  for  the  moment,  that  H  €  R7lX”  is  an  unreduced  symmetric  tridiagonal 
matrix  and  set  =  B.Q  +  r  /  where  Q It  =  H  —  tI  is  a,  QR  factorization.  Denote 
by  Hk  the  leading  principal  matrix  of  order  k  of  H  =  Hn.  The  main  result  proved  by 
Parlett  and  Le  is  a  necessary  and  sufficient  condition  for  the  onset  of  forward  instabil¬ 
ity.  The  instability  occurs  if  and  only  if  the  shift  r  is  close  to  an  eigenvalue  of  Hk  with 
a  small  last  component  of  the  corresponding  normalized  eigenvector.  Parlett  and  Le 
present  numerous  examples  illustrating  the  forward  instability.  Before  continuing,  we 
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present  three  examples  for  the  nonsymmetric  problem  that  serve  to  motivate  these 
ideas. 

Consider  the  matrix 


(5.3.1) 


H  = 


3  1  1-1 

1  3  -1  1 

0  10-12  2  1 

0  0  12 


Table  5.1  displays  spectral  information  of  H;  the  notation  UitU  stands  for  the  last 
component  of  the  i-th  normalized  eigenvector  of  H . 

Suppose  that  two  separate  explicit  QR  steps  are  performed  on  H  with  shifts  3  and 
4.  Computing  in  MATLAB,  Version  4.2a.,  on  a  SUN  SPARC  station  IPX  results  in 


(5.3.2)  H{  3)  « 


3  -1  -1.4  -l.l-lO"10 

-1  3  -1.4  —7.1  -  10-13 

0  1  1  -10"12 

0  0  0  3 


and 


(5.3.3) 


H(  4) 


r  2 

-1.4 

1.4 

-3.2 -lO"4  ] 

O 

1 

h-4 

w 

2 

1 

1 

-T 

O 

1 

0 

1 

2 

6.7  ■  10*4 

0 

0 

1 — 1 
O 

I 

3.9 

where  H(t )  =  f?.(r)Q(r)  +  tI.  The  floating  point  arithmetic  is  IEEE  standard  double 
precision  with  machine  precision  of  cm  =  2-52  ~  2.2204  •  10_lt>.  The  results  of 
equation  (5.3.3)  are  in  stark  contrast  to  Lemma  3.1  of  Chapter  3  where  as  those  of 
equation  5.3.2  conform.  The  last  property  of  Lemma,  3.1  implies  that  lor  shilts  that 


i 

Eigenvalue 

Condition  number 

<^1,4 

T" 

4 

1.2 

O(10"13) 

2 

1.999999999999 

1.2 

O(10-13) 

3 

1.000000000001 

1.5 

O(10-4) 

4 

3 

2.8 

O(10-4) 

Table  5.1  Eigenvalues  and  some  sensitivity  measures  for  H . 


60 


are  nearly  eigenvalues  of  H ,  the  last  row  of  cJ(R,(t)Q(t)  +  t I)  ~  Ae^,  where  A  is 
an  eigenvalue  of  H .  We  note  that  the  eigenvalues  of  both  matrices  are  still  equal  to 

those  of  Table  5.1. 

Let  (s\k\  \\k))  be  an  eigenpair  for  Hk,  the  leading  principal  sub-matrix  of  order  j 
of  JT,  and  let  u^k  =  ejs\k)  be  the  last  component  of  the  corresponding  eigenvector. 
Assume  that  s[k  is  a  unit  vector  for  k  =  1, . . . ,«.  Parlett  and  Le’s  analysis  formally 
extended  for  an  unreduced  Hessenberg  matrix  states  that  there  are  entries  of  H(t)  — 
R(t)Q(t)  T  tI  whose  derivatives  are  0( l/wt,jt)  with  respect  to  changes  in  r  when  r  is 
nearly  equal  to  \\k) .  This  analysis  is  corroborated  when  r  =  4  since  it  is  an  eigenvalue 
of  H2  and  uji<k  «  10"13.  The  last  sub-diagonal  of  II  should  be  on  the  order  of  machine 
precision;  however  |/Lj|  ~  1013cm-  Parlett  and  Le  also  observe  that  a  small  u>itk  is  an 
indicator  that  the  first  k  columns  of  H  -  A \k)  I  are  almost  linearly  dependent.  Since 
H  -  \\kh  =  QR ,  it  follows  that  the  condition  number  of  Ilj  -  A \kh  is  that  of  Rj 
where  Rj  is  the  leading  principal  sub-matrix  of  order  j  of  R.  The  condition  numbers 
K(Hj  -  Tlj)  =  II Hj  -  Tlj\\\\(Hj  -  Tlj)~l\\  are  displayed  in  Table  5.2.  We  believe  this 
geometric  interpretation  predicting  forward  instability  should  immediately  come  to 
mind  when  considering  the  size  of 

It  is  instructive  to  consider  performing  a  QR  step  on  H  with  an  implicitly  shifted 
variant  of  the  QR  algorithm.  Let  H(t)  be  the  computed  result  of  performing  the  QR 
step  implicitly  with  shift  r: 

1.4  1.4  0 

2  1  -7.1  •  lO"13 

12  0 
0  0  4 

Performing  the  step  implicitly  prevents  the  forward  instability  in  this  example. 


(5.3.4) 


H(  4) 


2 

7.1  •  10-13 
0 
0 


j 

n(Hj  —  ilj) 

<Hi  ~  3/,) 

1 

1 

1 

2 

O(1012) 

0(1) 

3 

O(1012) 

0(1) 

4 

O(10Ui) 

Too 

Table  5.2  Condition  numbers  for  the  shifted  matrices. 
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i 

Eigenvalue 

Condition  number 

1,4 

T~ 

2.999999999909999 

O(102) 

0(  10-1) 

2 

.9999990001706701 

0(10°) 

O(10-7) 

3 

1.00000099982933 

O(106) 

O(10-7) 

4 

3 

O(102) 

ojio-1) 

Table  5.3  Eigenvalues  and  and  some  sensitivity  measures  for  G. 


j 

*(Gj  ~  h) 

1 

l 

2 

+oo 

3 

O(1012) 

4 

O(1012) 

Table  5.4  Condition  numbers  for  the  shifted  matrices. 


Our  second  example  shows  that  both  explicit  and  implicit  implementations  are 
both  sensitive  to  the  shift  used:  Let 


(5.3.5) 


G 


2  1  -1  1 

1  2  1  -1 

0  10"12  2  1 

0  0  12 


with  spectral  information  given  by  Table  5.3. 
If  Wilkinson’s  shift  is  used,  then  we  obtain 


(5.3.6) 


and 


G(I) 


3 

0 

10-m 

0 

I  •  10-13 

2 

.58 

.82 

0 

1.7 

.67 

-.47 

0 

0 

.94 

2.3 

3 

0 

0 

0 

CO 

rH 

1 

o 

2 

.58 

.82 

0 

1.7 

.67 

-.47 

0 

0 

.94 

2.3 

(5.3.7) 
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Although  Wilkinson’s  shift  shares  seven  digits  of  accuracy  with  two  of  the  eigenvalues 
of  G,  the  last  sub-diagonal  elements  of  both  G  and  G  are  order  unity.  This  is  predicted 
by  Parlett  and  Le’s  analysis  since  10~7  -cujj  =  1  «  .94.  The  condition  numbers  of  the 
eigenvalues  measure  the  possible  loss  of  accuracy  subject  to  changes  in  the  matrix 
elements.  Since  the  orthogonal  matrices  effecting  the  explicit  and  implicit  QR  steps  are 
only  numerically  so,  perturbations  are  introduced.  Sorting  the  computed  eigenvalues 
of  G  and  G  into  ascending  order  gives  for  i  =  1 , 2 

^G~f~  =  0( KT10)  »  |At(G)  -  At(G)|  =  O(10-n), 

where  ||G  -  G||  «  10~4.  In  words,  the  accuracy  of  the  computed  eigenvalues  is 
essentially  the  ratio  of  the  norm  difference  of  the  two  matrices  produced  by  the  implicit 
and  explicit  QR-iterations  and  the  condition  number  of  the  eigenvalue.  Table  5.4  gives 
an  alternative  measure  for  the  amount  of  forward  instability  that  the  QR  algorithm 
may  undergo. 

The  third  and  final  example  shows  that  small  sub-diagonal  entries  are  not  needed 
for  the  QR  algorithm  to  undergo  forward  instability.  Let 


(5.3.8) 


with  spectral 
vectors  for  F 


F  = 


200  100  0  1 
100  200  0  0 
0  12  1 
0  0  12 


information  given  by  Table  5.5.  We  also  add  that  the  matrix  of  eigen- 
ha.s  condition  number  0(1).  Computing  an  implicit  QR  step  with  shift 


i 

Eigenvalue 

Condition  number 

Wi,  4 

T~ 

300.0000056304403 

0(1) 

0(10-°) 

2 

99.9994793289591 

0(1) 

O(10"5) 

3 

3.001734106345232 

0(1) 

ojio-1) 

4 

.9983123303188788 

0(1) 

O(10-4) 

Table  5.5  Eigenvalues  and  and  some  sensitivity  measures  for  F. 


(5.3.9) 


F(100)  « 


.7071  2  1  -.7071 

0  12  O' 

0  0  .7071  100 

Although  the  relative  error  of  the  shift  100  with  respect  to  the  nearest  eigenvalue  ol  F 
is  O(10-7),  the  last  sub-diagonal  element  of  F(100)  is  OtIO-1).  Since  the  eigenvalue 
(and  eigenvector)  problem  tor  F  are  extremely  well  conditioned,  shifting  with  the 
numerically  exact  shift  A2  =  99.9994793289591  given  in  Table  5.5  should  result  in  an 
0{t\i)  term  in  the  last  sub-diagonal  entry  ot  F( A2).  Instead, 

.7071  .0051 

1.0051  -.7071 

2  -.7071  ’ 

6.6  •  10-10  ( 

is  computed  where  (  =  .9999994793290061.  Note  that  the  relative  error  in  (  to  A2 
is  O(10-14)  but  that  an  order  O(10-10)  element  emerges  in  the  last  sub-diagonal 
entry.  Once  again,  the  sensitivity  is  measured  by  the  reciprocal  ot  u>2,4  since  — 

O(10_1°). 

In  a  study  examining  the  deterioration  of  forward  stability  during  an  implicit 
QR  step,  Watkins  [99]  investigates  the  transmission  of  the  shift  through  the  matrix. 
Watkins’  analysis  also  shows  that  small  sub-diagonal  elements  are  not  reliable  indica¬ 
tors  for  predicting  the  loss  ot  forward  stability.  This  is  substantiated  by  the  previous 
examples.  It  is  also  shown  that  even  when  the  QR  step  does  undergo  forward  insta¬ 
bility,  the  shift  still  manages  to  get  propagated  through  the  entire  matrix.  The  only 
manner  in  which  a  shift  can  tail  to  be  transmitted  is  when  it  is  small  and  the  entries 
in  the  leading  portion  of  the  matrix  are  large.  Stewart  observed  this  phenomenon  lor 
the  QR  algorithm  on  symmetric  tridiagonal  matrices  [84], 

5.3.1  Premature  Deflation 

Parlett  and  Le  showed  that  if  forward  instability  occurs  during  an  implicit  QR  step, 
it  is  preceded  by  premature  deflation.  Before  defining  premature  deflation,  we  review 
some  necessary  details  concerning  an  implicit  QR.  step.  An  implicit  QR  step  with  a  real 


(5.3.10) 


F(  A2) 


300  3.8  •  10"° 

.7071  2.0001 

0  1 

0  0 
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shift  is  calculated  by  forming  (f/a  •  •  •  Un-X)T  HUX  ■  ■  ■  Un- 1  where  each  f/,-  is  an  orthog¬ 
onal  matrix.  The  orthogonal  matrices  most  commonly  used  are  plane,  or  Givens’,  ro- 
tations.  The  first  rotation  is  constructed  so  that  U? (H  —  t I)e x  =  ex  \J{ax  —  r)2  +  fl2. 
The  similarity  transformation  U?  HUX  introduces  a  nonzero  entry,  or  bulge,  in  the 
(3, 1)  entry.  The  remaining  plane  rotations  chase  the  bulge  successively  down  the 
sub-diagonal. 

Suppose  that  the  following  3  x  3  sub-matrix  of  (Ux  ■  •  •  lJi)T  HUX  •  •  •  f/;  arises: 


column  i 

column  i+1 

column 

row 

i 

X 

X 

row 

i+1 

f 

X 

row 

i+2 

b 

(2 

cq+2 

If  both  ex  and  e2  are  small  and  f  is  nearly  equal  to  the  shift  r  used,  then  premature 
deflation  has  occurred.  Watkins’  shows  that  the  entries  marked  by  an  ‘  x  and  fi  are 
not  relevant  to  the  analysis.  As  an  example,  the  sequence  of  intermediate  matrices 
computed  during  the  QR  step  with  Wilkinson’s  shift  that  results  in  (r('l)  undergoes 
premature  deflation.  Starting  with  the  first  Givens’  rotation  designed  to  annihilate 
the  (2, 1)  entry,  the  sequence  is 


(UXU2)TGUXU2 


3  0  0  0 

0  1  1.4  -1.4 

7.1  •  Hr13  7.1  •  10-13  2  1 

0  0  12. 

3  0  0  0 

7.1 -HT13  2  —7.1  -  10-13  1 

0  -1.4  1  1.4 

0  10  2 


and  finally  (UXU2U3)T  GUX U2IJS  =  G{  1).  Notice  that  for  l/fGU,  the  (2,1)  entry  is 
zeroed  out,  the  (3,2)  entry  is  small  and  the  shift  emerges  in  the  (2,2)  position.  This 
is  premature  deflation.  Parlett  and  Le’s  analysis  shows  that  premature  deflation  is 
necessary  for  the  implicitly  shifted  QR  algorithm  on  symmetric  tridiagonal  matrices 
to  undergo  forward  instability.  Watkins  demonstrates  that  along  with  premature 
deflation,  certain  sub-diagonal  entries  must  undergo  a.  significant  reduction  in  size 
after  the  QR  step.  This  is  evident  in  the  above  example  since  e^Ge1/e|’G(l)e1  = 
O(1013).  It  is  shown  that  the  only  way  that  a  sub-diagonal  element  becomes  tiny  is 
through  a  cancelation  error. 
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5.4  Re-ordering  the  Real  Schur  Form  of  a  Matrix 

Suppose  that  the  upper  Hessenberg  matrix  H  =  H^p  computed  during  a  cycle  of  an 
IRA-iteration  is  reduced  to  upper  quasi-triangular  form  by  the  QR  algorithm: 


(5.4.1) 


QtHQ 


R, 

Rll  Rl2 

0  R.22 


where  Q  is  the  orthogonal  matrix  computed  by  the  algorithm.  Equation  (5.4.1)  is  a 
real  Schur  form  for  H  of  order  k  +  p  where  the  sub-matrices  R\\  and  R.22  Are  ol  order 
k  and  p,  respectively.  Assume  that  the  spectrums  of  Rn  and  R.22  are  distinct.  In 
practice,  the  order  in  which  the  computed  eigenvalues  ol  H  appear  on  the  diagonal  of 
R  depends  upon  the  shifts  applied.  Two  algorithms  lor  re-ordering  the  real  Schur  form 
of  a  matrix,  an  iterative  and  direct  variant,  were  presented  in  §  3.4.4  of  Chapter  3. 

The  iterative  swapping  algorithm  is  equivalent  to  the  implicit  re-starting  tech¬ 
nique  used  by  the  IRA-iteration  since  both  depend  upon  an  implicitly  shifted  QR  step 
applied  to  an  unreduced  upper  Hessenberg  matrix  to  interchange  Ru  and  Rn-  The 
direct  swapping  algorithm  is  equivalent  to  a  deflation  technique,  locking,  presented 
in  Chapter  6.  An  orthogonal  matrix  is  constructed  from  a.  basis  lor  the  invariant 
subspace  corresponding  to  R.22-  When  this  is  applied  as  a  similarity  transformation 
the  diagonal  blocks  of  R.  are  swapped.  In  exact  arithmetic,  both  swapping  variants 
result  in  a  matrix  that  is  upper  quasi-triangular  with  the  blocks  interchanged. 

The  following  example  demonstrates  that  the  two  variants  may  produce  drastically 
different  output  matrices  when  computed  in  floating  point  arithmetic.  We  compute 
under  the  same  conditions  as  in  the  last  section.  Let 

j,  _  1  +  10cm  1 

Ol' 


An  eigenvector  corresponding  to  A2  =  1  is 


lOtM 


.  Denote  by  Z  the  plane  rotation 


that  transforms  this  eigenvector  to  a,  multiple  of  the  first  column  ol  the  identity  matrix 
in  R2x2.  Let 


1  -5eM 

10f  M  1 


u 
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so  that  U  is  orthogonal  to  a  small  multiple  of  machine  precision.  The  matrix  U  acts 
as  a  possible  arbitrary  orthogonal  transformation  required  by  the  iterative  algorithm. 
Let  T  denote  the  matrix  computed  by  performing  one  step  of  the  QR-iteration  to  the 
matrix  UTTU  with  shift  equal  to  Ax  =  1  +  l(kM.  We  remark  that  for  matrices  of 
order  two,  the  explicit  and  implicit  formulations  of  the  QR-iteration  are  equivalent. 
The  two  computed  matrices  a.re: 


ZtTZ 

T 


1  -1 
0  1  +  10  eM  ’ 

1.400000000000003  -7.999999999999996  ■  10"1 

2.000000000000002  •  10_1  6.000000000000001  ■  10_1 


The  computed  eigenvalues  of  T  are 

1.000000033320011  and  9.999999606799921  -lO"1, 


which  both  lost  eight  digit 
trix  T  with  the  same  shift, 


of  accuracy.  If  another  QR-step  is  performed  on  the  ma- 
1.000000000000003  1.000000000000001  " 
sa  1.09  •  10“15  1 


is  computed. 


Note  that  the  off-diagonal  element  is  slightly  larger  than  machine  precision  so  that  a 
standard  QR  algorithm  does  not  set  it  to  zero.  But  even  if  the  off-diagonal  element 
is  set  to  zero,  the  iterative  swapping  algorithm  fails  to  interchange  the  eigenvalues. 
Continuing  to  apply  QR-steps  with  the  shift  equal  to  Aj  does  not  result  in  a  properly 
interchanged  matrix. 

The  explanation  why  the  iterative  algorithm  fails  to  work  is  simple  enough.  The 
matrix  T  constructed  is  poorly  conditioned  with  respect  to  the  eigenvalue  problem 
since  the  eigenvectors  are  nearly  aligned.  The  eigenvalues  of  UTTU  are 


1 .00000003332001 1  and  9.999999666799921  •  10  \ 


Thus  the  small  relative  errors  on  the  order  of  machine  precision  that  occur  when 
computing  UTTU  produce  a.  nearby  matrix  in  which  both  the  eigenvalues  differ  by 
eight  digits  of  accuracy.  Performing  a.  shifted  QR  step  with  A!  incurs  forward  insta¬ 
bility  since  the  last  components  of  the  eigenvectors  for  UTTU  are  on  the  order  of 
y/tM-  This  is  the  necessary  and  sufficient  condition  of  Parlett  and  Le  [63].  Another 
QR  step  with  the  same  shift  on  T  almost  zeros  out  the  sub-diagonal  element  since  the 
last  components  of  the  eigenvectors  for  T  are  order  10-1  and  the  shift  is  almost  the 
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average  of  the  eigenvalues  of  T  and  quite  close  to  both.  We  emphasize  that  the  loss 
of  accuracy  of  the  computed  eigenvalues  is  one  of  the  deleterious  effects  of  forward 
instability. 

Bai  and  Demmel  [9]  present  an  example  which  compares  their  direct  swapping 
approach  with  Stewart’s  algorithm  EXCHNG.  The  matrix  considered  is 


A(t) 


'  7.001  -87  39. 4r  22.2r 

5  7.001  — 12.2r  36. Or 

0  0  7.01  -11.7567 

0  0  37  7.01 


When  t  =  10,  ten  iterations  QR-iterations  are  required  to  interchange  the  two  blocks. 
As  before,  the  eigenvalues  undergo  a,  loss  of  accuracy.  The  iterative  swapping  algo¬ 
rithm  fails  for  the  matrix  A(100).  No  explanation  is  given  for  the  failure  of  Stewart’s 
algorithm.  The  explanation  for  the  failure  is  the  same  as  for  the  previous  example. 
Using  a  direct  algorithm,  the  eigenvalues  of  A(10)  and  T(100)  are  correctly  swapped 
and  the  eigenvalues  lose  only  a,  tiny  amount  of  accuracy. 

Bai  and  Demmel  presents  a  rigorous  analysis  of  their  direct  swapping  algorithm. 
Although  backward  stability  is  not  guaranteed,  it  appears  that  only  when  both  Tn 
and  Tn  are  both  of  order  two  and  have  almost  indistinguishable  eigenvalues  [15]  is 
stability  lost.  In  this  case,  the  interchange  is  not  performed.  Bojanczyk  and  Van 
Dooren  [15]  present  an  alternate  swapping  algorithm  that  appears  to  be  backward 
stable. 


5.5  Implications  for  an  IRA-Iteration 

A  robust  implementation  of  an  IRA-iteration  relies  upon  the  proper  transmission  of 
shifts  during  the  implicit  application  shift  application.  The  discussion  that  followed 
Algorithm  4.2  used  the  convergence  theory  for  the  QR-iteration  developed  in  §  3.2  to 
conclude  that  all  the  sub-diagonal  elements  of  Hjf+1\  not  including  those  correspond¬ 
ing  to  complex  conjugate  pairs,  go  to  zero  if  the  polynomial  min- max  problem  (3.2.1) 
of  Chapter  3  is  approximately  solved.  In  particular,  if  an  exact  shift  strategy  is  used 
for  Algorithm  4.2  in  Chapter  4,  Theorem  4.4  implies  that  the  sub-diagonal  entry 
/3^+1^  is  zeroed  out  during  the  j-th  iteration.  However,  as  the  examples  in  §  5.3-  5.4 
demonstrate,  f3[3+1'>  may  not  even  be  small,  let  alone  negligible. 

The  theory  reviewed  and  developed  in  the  first  three  chapters  of  this  thesis  present 
an  analysis  of  what  occurs  in  exact  arithmetic.  Computing  in  finite  precision  arith- 
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metic,  however,  complicates  the  situation.  The  phenomenon  of  the  forward  instability 
of  the  QR  algorithm  examined  in  the  last  two  sections  could  have  a  possibly  detri¬ 
mental  effect  upon  the  accuracy  in  the  computed  eigenvalues.  Since  the  IRA-iteiation 
is  a  truncation  of  the  implicitly  shifted  QR  algorithm,  it  also  is  susceptible  to  loss 
of  accuracy  through  forward  instability.  This  indicates  that  it  may  be  impossible 
to  filter  out  unwanted  Ritz  values  with  the  implicit  re-starting  technique  in  practi¬ 
cal  computations.  This  is  the  motivation  for  developing  the  deflation  techniques  of 
Chapter  7.  In  particular,  using  a  converged  Ritz  value  as  a  shift  may  incur  forward 
instability.  Since  the  norm  of  is  the  sub-diagonal  entry  /?£+i  \  forward  insta¬ 

bility  may  prevent  the  residual  vectors  of  the  successive  Arnoldi  factorizations  from 
ever  approaching  zero. 

For  example,  consider  the  following  thought  experiment.  Suppose  that  the  exact 
shift  strategy  is  used  for  Algorithm  4.2  and  p  >  1  shifts  are  to  be  applied.  According  to 
Theorem  4.4,  the  computed  fc-tli  sub-diagonal  entry  (3k+i  should  be  zero.  Computing 
in  floating  point  arithmetic,  though,  gives  that  all  we  may  expect  is  that  the  computed 
k- th  sub-diagonal  entry  be  on  the  order  of  cm  relative  to  the  norm  of  the  matiix. 
However,  the  forward  instability  of  the  QR  algorithm  may  prevent  the  computed  fc-th 
sub-diagonal  entry  ftk+i  from  becoming  small.  Application  of  the  first  shift  possibly 
introduces  perturbations  so  that  the  remaining  shifts  are  no  longer  eigenvalues  of  the 
updated  matrix.  Thus,  further  QR  steps  may  not  lead  to  a  negligible  fik+i  after  p 
implicit  shifts.  The  examples  of  the  previous  section  illustrate  this  behavior.  The 
possible  ill  conditioning  of  the  nonsymmetric  eigenvalue  problem  also  exacerbates  the 
situation  since  inaccurate  eigenvalues  may  result  from  the  computed  errors  in  the 
matrix  elements  due  to  forward  instability.  An  obvious,  but  expensive  solution,  is  to 
recompute  the  eigenvalues  of  the  deflated  matrix  after  every  implicit  shift  application. 

5.6  The  Sensitivity  of  the  Hessenberg  Decomposition 

Theorem  2.5  of  Chapter  2  determines  conditions  for  a  length  k  truncated  Arnoldi 
factorization.  The  following  geometric  result  indicates  the  dependence  of  the  residual 
vector  upon  the  starting  one  used  during  the  Hessenberg  decomposition.  Simply 
stated,  if  the  starting  vector  for  an  Arnoldi  factorization,  or  any  other  orthogonal 
reduction  to  Hessenberg  form,  is  nearly  in  an  invariant  subspace  for  A  of  dimension 
m,  the  residual  vector  associated  with  the  length  rn  Arnoldi  factorization  may  not  be 
small  as  exact  arithmetic  leads  us  to  expect. 
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Theorem  5.3  Let  A  6  RnX".  Suppose  that  AQm  =  QmTm  is  a  real 
partial  Schur  factorization  of  order  m,  and  that  AVm  =  VmHm  is  a  length 
m  Arnoldi  factorization  where  Hm  is  unreduced,  and  that  v\  =  Vvle.\  = 
Qmy,  and  let  Kni(A,vi)cm  =  Amv x.  If  tvx  =  vx  +  w  is  an  unit  vector 
with  Qjnw  =  0  such  that  AV:i  =  VjHj  +  fjeJ  is  the  corresponding  Arnoldi 
factorization  with  Vje i  =  tv x,  and 


max{ 


\\Km{A,  'te)||  1|  Amw 


■}  <  1, 


then 


(5.6.1)  p,Jm+ 1  <  {l  +  2K2(/C(A,ni))}||ATOu]||e  +  0(e2), 
where  pm  =  fa- ■■  fan- 

Proof  Suppose  that  AVj  =  VjHj  +  .f-jt'J  is  an  Arnoldi  factorization  with  v\  — 
Vme\  =  Qmy  where  AQm  =  QmTm  is  a  real  partial  Schur  factorization  of  order  m. 
Let  tv i  =  Vi  +  w  be  an  unit  vector  such  that  Qjaw  =  0,  and  AV)  —  VjHj  +  fj ej  is 
the  corresponding  Arnoldi  factorization  with  VjCi  =  rui. 

Using  Ruhe’s  characterization  of  the  Hessenberg  decomposition  in  equation  (2.4.3) 
of  Chapter  2  it  follows  that 

\\A3vi  -  Kj{A,vx)cj\\  =  min  HAhq  -  A"j(A,ui)c||, 

=  Hull, 

II A3tvi  -  I<j(A,TVi)cj\\  =  min  \\A3tvi  -  I<j(A,TVi)c\\, 

cER7 

=  Hull- 

But, 

(5.6.2)  Hull  =  ||  A’(Vl  +  w)  -  {Kj(A,  m)  +  Kj(A,  «?)}ci||. 

Standard  results  [35,  page  228]  on  the  sensitivity  of  the  least  squares  problem  give 
||  U  —  U II  <  ||AJt>i||{'l  +  ‘2K2(I<j{A,v1))}e  +  0(e2). 


In  particular,  when  j  —  m  it  follows  that  rm  —  0.  Theorem  2.3  implies  that 
the  QR  factorizations  of  K1n(A,v i)  and  K1n(A,TVi)  are  VmRm  and  Vin R.m ,  respec¬ 
tively,  where  both  Rm  and  Rm  are  nonsingular  upper  triangular  matrices  of  order  m. 
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Equation  (2.4.4)  gives  ||rm||  =  />m||/m||  =  PmPm+u  where  pm  -  cT  em .  The  proof 

of  Theorem  2.3  computes  the  equality  ej Rmei  =  /V  •  •  Pi  f°r  *  =  2, . . . ,  m.  □ 

The  sensitivity  of  the  product  of  the  sub-diagonal  elements  of  the  perturbed 
Arnoldi  factorization  depends  linearly  on  /c2(ATi(A,  rq )).  Since  rc2(Aij_i  (A,  rq))  < 
K2(Kj(A,  tq)),  the  theorem  argues  against  building  large  factorizations.  Also  note 

that  ||Amrq||  =  \\AmQny\\  =  11011- 

Suppose  the  solution  to  the  perturbed  least  squares  problem  is  Cj  =  Cj  +  8c j. 
When  j  =  m ,  equation  (5.6.2)  of  the  proof  leads  to 

fi2  ■  ■  ■  f3m+ 1  =  I|f'm||  =  II AmU>  -  Km(A,Vi)8cm  +  Km(A,  'in)cTri || , 

where  second-order  terms  are  ignored.  It  is  this  combination  of  vectors  that  is  re¬ 
sponsible  for  the  possible  amplification  of  the  perturbation. 

There  is  an  interesting  connection  between  Arnoldi  factorizations  and  moment 
matrices  that  gives  a  lower  bound  on  the  product  pm  =  Nachtigal  [55, 

page  36]  discusses  a  similar  connection  between  moment  matrices  and  the  nonsym- 
metric  Lanczos  process.  Since  K,n  is  of  full  column  rank,  KjnKm  is  a.  positive  definite 
symmetric  matrix.  By  Theorem  2.3  of  Chapter  2, 

Im  =  =  ^ m ’ 

where  Km  =  Km(A,i q)  results  in 

Defining  Lm  =  i?,"T,  the  Cholesky  factorization  Mm  =  LrnLjn  is  determined  by  the 
inverse  of  the  Fourier  coefficient  matrix  Rm.  Since  the  i-th  sub-diagonal  element  of 
Hm  is  ^  for  i  =  2,. . .  ,m  define  px  =  1(=  ||tq||).  Thus,  the  reciprocal  of  the  product 
pi  •  •  •  Pi  is  the  i-tli  pivot  used  during  the  Cholesky  factorization  of  the  moment  matrix 
Mm.  A  standard  result  [35,  page  145]  on  the  numerical  stability  of  the  Cholesky 
factorization  implies  that 

and  hence,  (n^(A^_1^)rA^_1biq)_1^2  <  P2  •  •  ■  Pm ■  Note  that  e^l+1i?.m+iem+i  =  0 
since  Ami q  is  a  linear  combination  of  the  columns  of  A"m(A,tq).  Hence  the  Cholesky 
factorization  of  Km+i  is  not  defined  since  the  diagonal  element  eJn+1Lin+1em+i  does 
not  exist. 
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Let  K*+1Km+ 1  =  Kl+  i^m+i  be  the  Cliolesky  factorization  of  the  perturbed 
Krylov  matrix  Km+\(A,  ti>i)  using  the  notation  of  Theorem  5.0.  Since  e,  A’m-fiCj 
is  just  the  reciprocal  of  ej Rm+\ei  for  t  =  1, . . . ,  m  +  1,  the  implication  is  that  if 
(tvi)t A2mTv-[  is  not  large  then  fa  ■  •  •  fan+i  is  not  small.  Since  tv i  =  iq  +  w,  it  fol¬ 
lows  that  (tv1)t A2mTvx  will  be  not  be  large  when  the  eJn+1R^+1em+i  is  not  small— 
precisely  the  situation  that  indicates  that  forward  instability  occurred  during  the 
orthogonal  reduction  of  A  to  upper  Hessenberg  form. 

Finally,  we  remark  that  the  sensitivity  of  a  Hessenberg  decomposition  via  or¬ 
thogonal  matrices  can  help  explain  the  perplexing  numerical  behavior  of  the  Arnoldi 
iteration  for  computing  eigenvalues.  Suppose  that  AVm  =  Vm Hm  +  jm ejn  is  an  Arnoldi 
factorization  of  length  m.  It  is  often  observed  that  although  k  Ritz  estimates  of  the 
factorization  may  be  suitably  small,  the  residual  vector  jm  may  not  be  even  for 
values  of  m  slightly  larger  than  or  equal  to  k.  Since  a  step  of  a  shifted  QR-iteration  is 
equivalent  to  replacing  the  starting  vector,  the  potential  forward  instability  of  the  QR 
algorithm  examined  in  this  chapter  may  also  be  explained  by  Theorem  5.3.  Extieme 
sensitivity  of  some  of  the  matrix  elements  to  the  shift  during  a  QR  step  is  equivalent 
to  a  starting  vector  having  a  small  perturbation  in  an  unwanted  direction. 
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Chapter  6 

Deflation  Techniques  within  an  IRA-iteration 

The  connection  between  the  IRA  and  QR-iterations  motivates  ns  to  take  advantage 
of  the  well  understood  deflation  rules  ot  the  QR  algorithm  and  adapt  them  to  the 
former  iteration.  These  deflation  techniques  are  extremely  important  with  respect 
to  convergence  and  numerical  properties.  Deflation  rules  have  contributed  greatly  to 
the  emergence  of  the  practical  QR  algorithm  as  the  method  of  choice  for  computing 
the  eigen-system  of  dense  matrices.  This  chapter  introduces  deflation  schemes  that 
may  be  used  within  an  IRA-iteration.  The  iteration  is  designed  to  compute  a  selected 
subset  of  the  spectrum  of  A  such  as  the  k  eigenvalues  of  largest  real  part.  We  refer  to 
this  selected  subset  as  wanted  and  the  remainder  of  the  spectrum  as  unwanted.  As 
the  iteration  progresses,  some  of  the  Ritz  value  approximations  to  eigenvalues  of  A 
may  converge  long  before  the  entire  set  of  wanted  eigenvalues  have,  these  converged 
Ritz  values  may  be  part  of  the  wanted  or  the  unwanted  portion  of  the  spectrum.  In 
either  case  it  is  desirable  to  deflate  the  converged  Ritz  values  and  corresponding  Ritz 
vectors  from  the  unconverged  portion  of  the  factorization.  If  the  converged  Ritz  value 
is  wanted  then  it  is  necessary  to  keep  it  in  the  subsequent  Arnoldi  factorizations.  This 
is  called  locking.  If  the  converged  Ritz  value  is  unwanted  then  it  must  also  be  removed 
from  the  current  and  subsequent  Arnoldi  factorizations.  This  is  called  purging.  These 
notions  will  be  made  precise  during  the  course  of  the  chapter.  For  the  moment  we 
note  that  the  advantages  of  a  numerically  stable  deflation  strategy  include: 

•  Reduction  of  the  working  size  ot  the  desired  invariant  subspace. 

•  The  ability  to  determine  clusters  of  nearby  eigenvalues  without  need  for  a  block 
Arnoldi/Lanczos  method  [39,  79,  80]. 

•  Preventing  the  effects  of  the  forward  instability  of  the  Arnoldi/Lanczos  algo¬ 
rithm  discussed  in  Chapter  5. 

Deflating  within  the  IRA-iteration  is  examined  in  §  6.1.  The  deflation  scheme 
for  converged  Ritz  values  is  presented  in  §  6.2.  The  practical  issues  associated  with 
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our  deflation  scheme  a, re  examined  in  §  6.3.  These  include  block  generalizations  ol 
the  ideas  examined  in  §  6.2  lor  dealing  with  clusters  ol  Ritz  values,  avoiding  the  use 
of  complex  arithmetic  when  a  complex  conjugate  pair  ol  Ritz  values  converges  and 
an  error  analysis.  A  brief  survey  of  other  deflation  strategies  is  given  in  §  6.5.  An 
interesting  connection  with  the  various  algorithms  used  to  re-order  a  Schur  form  ol 
matrix  is  presented  in  §  5.4.  Numerical  results  are  presented  in  §  6.6. 


6.1  Deflation  within  an  IRA-iteration 


As  the  iteration  progresses  the  Ritz  estimates  (2.5.1)  decrease  at  different  rates.  When 
a  Ritz  estimate  is  small  enough,  the  corresponding  Ritz  value  is  said  to  have  con¬ 
verged.  The  converged  Ritz  value  ma,y  be  wanted  or  unwanted.  In  either  case,  a 
mechanism  to  deflate  the  converged  Ritz  value  from  the  current  factorization  is  de¬ 
sired.  Depending  on  whether  the  converged  Ritz  value  is  wanted  or  not,  it  is  useful 
to  define  two  types  of  deflation.  Before  we  do  this,  it  will  prove  helpful  to  illustrate 
how  deflation  is  achieved.  Suppose  that  after  rn  steps  ol  the  Arnoldi  algorithm  we 
have 


(6.1.1) 


+  ■/ 


1 


where  Vj  6  RnXJ,  Hi  €  RJXJ  for  1  <  j  <  rn.  If  e  is  suitably  small  then  the  factor¬ 
ization  decouples  in  the  sense  that  a,  Ritz  pair  ($,  0)  for  Hi  provides  an  approximate 
eigen  pair  (x  =  Vis,  0)  with  a  Ritz  estimate  of  |eejs|.  Setting  e  to  zero  splits  a  nearby 
problem  exactly  and  setting  e  =  0  is  called  deflation.  II  e.  is  suitably  small  then  all 
the  eigenvalues  of  Hi  may  be  regarded  as  converged  Ritz  values. 


6.1.1  Locking 

If  deflation  has  taken  place  and  all  of  the  deflated  Ritz  values  are  wanted,  they  are 
considered  locked.  This  means  that  subsequent  implicit  restarting  is  done  on  the  basis 
V2.  The  sub-matrices  effected  during  implicit  restarting  are  M,  H2  and  V2.  However, 
during  the  phase  of  the  iteration  that  extends  the  Arnoldi  factorization  from  k  to  k+p 
steps,  all  of  the  columns  of  Vi  V2  participate — just  as  if  no  deflation  had  occurred. 
This  assures  that  all  of  the  new  Arnoldi  basis  vectors  are  ort.hogonalized  against 
converged  Ritz  vectors  and  prevents  the  introduction  of  spurious  eigenvalues  into  the 
subsequent  iteration.  Moreover,  this  provides  a,  means  to  safely  compute  multiple 
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eigenvalues  when  they  are  present.  A  block  method  is  not  required  it  deflation  and 
locking  are  used.  The  concept  of  locking  was  introduced  by  Jennings  and  Stewart  [92] 
as  a  deflation  technique  for  simultaneous  iteration. 

6.1.2  Purging 

If  deflation  has  occurred  but  some  of  the  deflated  Ritz  values  are  unwanted,  a  further 
mechanism,  purging,  must  be  introduced  to  remove  the  unwanted  Ritz  values  and 
corresponding  vectors  from  the  factorization.  In  exact  arithmetic  this  would  not  be 
necessary  because  the  implicit  shift  technique  would  accomplish  the  removal  ol  the 
unwanted  Ritz  pair  from  the  leading  portion  of  the  iteration.  However,  computing 
with  finite  precision  arithmetic  may  make  it  impossible  to  accomplish  the  removal 
because  of  the  forward  instability  [63,  99]  of  the  QR  algorithm  discussed  in  Chapter  5. 
The  basic  idea  of  purging  is  perhaps  best  explained  with  the  case  of  a  single  deflated 
Ritz  value. 

Let  j  —  1  in  (6.1.1)  and  equate  the  first  columns  of  both  sides  to  obtain 

(6.1.2)  Av  i  —  eitti  +  eV-jei, 

where  v\  -  V\ex  and  Hi  —  rq.  Equation  (6.1.2)  is  an  Arnoldi  factorization  of  length 
one.  The  Ritz  value  rq  has  R.itz  estimate  |e|. 

Equating  the  last  m  —  1  columns  of  (6.1.1)  results  in 

(6.1.3)  AV2  =  ViM  +  V2H2  +  feJn_  l5 

Suppose  that  cq  represents  an  unwanted  R.itz  value.  If  A  were  symmetric  then  M  — 
eej  and  equation  (6.1.3)  becomes 

(A  +  E)V 2  =  V2H2  +  i  eju_l, 

where  E  =  —  tvi{V2ei)T  —  e{V2t,i )uj .  A  simple  derivation  shows  that  ||EI||  =  e  and 
hence  equation  (6.1.3)  defines  a  length  m  —  1  Arnoldi  factorization  for  a  nearby 
problem.  The  unwanted  Ritz  pair  (iq,(q)  may  be  purged  from  the  factorization 
simply  by  taking  V  =  V2  and  H  —  H2  and  setting  M  =  0  in  (6.1.3).  If  A  is  not 
symmetric,  the  1  X  (m  —  1)  matrix  M  couples  rq  to  the  rest  of  the  basis  vectors  V2. 
This  vector  may  be  decoupled  using  the  standard  Sylvester  equation  approach  [9,  35]. 
Purging  then  takes  place  as  in  the  symmetric  case.  However,  the  new  set  of  basis 
vectors  must  be  re-orthogonalized  in  order  to  return  to  an  Arnoldi  factorization.  This 
procedure  is  developed  in  §  6.2  and  §  6.3  including  the  case  of  purging  several  vectors. 
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6.1.3  Complications 

An  immediate  question  is:  Do  any  sub-diagonal  elements  in  the  Hessenberg  matrix 
of  the  factorization  (6.1.1)  become  negligible  as  an  IRA-iteration  progresses  ?  Since  a 
cycle  of  the  Arnoldi  iteration  involves  performing  a  sequence  of  QR  steps,  the  question 
is  answered  by  considering  the  behavior  of  the  QR-iteration  upon  upper  Hessenberg 
matrices.  In  exact  arithmetic  under  the  assumption  that  the  Hessenberg  matrix 
is  unreduced,  only  the  last  sub-diagonal  element  may  become  zero  when  shifting. 
But  the  other  sub-diagonal  elements  may  become  arbitrarily  small.  In  addition,  as 
discussed  in  Chapter  5,  the  forward  instability  of  an  IRA-iteration  possibly  lenders 
the  sub-diagonal  entries  of  H  meaningless. 


6.2  Deflating  Converged  Ritz  Values 

During  an  Arnoldi  iteration,  Ritz  values  may  converge  with  no  small  sub-diagonal  el¬ 
ements  appearing  on  the  sub-diagonal  of  Hk  ■  However,  when  a  Ritz  value  converges, 
it  is  always  possible  to  make  an  orthogonal  change  of  basis  in  which  the  appropriate 
sub-diagonal  of  Hk  is  zero.  The  following  result  indicates  how  to  exploit  the  con¬ 
vergence  information  available  in  the  last  row  of  the  eigenvector  matrix  for  Hk-  For 
notational  convenience,  all  subscripts  are  dropped  on  the  Arnoldi  matrices,  V ,  H  and 
/,  for  the  remainder  of  this  section. 

Lemma  6.1  Let  Hs  =  sO  where  H  €  Rfcxfc  is  an  unreduced  upper 
Hessenberg  matrix  and  6  G  R  with  \\s\\  =  1  .  Let  W  be  a  Householder 
matrix  such  that  Ws  =  ej(  where  (  —  ±1.  Then 

(6.2.1)  eTkW  =  eTk+wT, 
where  || u;||  <  -v/2|ejT.s|  and 

(6.2.2)  WTHWtl  = 


Proof  The  required  Householder  matrix  has  the  form  W  =  I  —  y(.s  —  <Ti)(-s  —  Cei)T5 
where  7  =  (1  +  |ef.s|)-1.  A  direct  computation  reveals  that 


(6.2.3) 
where  wT 


el  W  =  e{  +  w1 


\w 


7 ejfs((ef  -  sT).  Estimating 

cl 

-  Cei|| 


eIs\ 


1  +  I  Cl’s  I 


l  +  |e?‘ 


:\/2(l  +  ko-sl)  —  v/<2|e|’.s|, 
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establishes  the  bound  on  ||u;||.  The  final  assertion  (6.2.2)  follows  from 
WTHWel  =  ClWTHs  =  (~10WTs  =  C'OWs  =  0ev 

□ 

The  hypothesis  that  H  is  unreduced  assures  that  \ejs\  ^  0  by  Lemma  2.L 
Lemma  6.1  indicates  that  the  last  row  and  column  of  W  differ  from  the  last  row 
and  column  of  Ik  by  terms  of  order  |efs|.  The  Ritz  estimate  (2.5.1)  will  indicate 
when  the  corresponding  Ritz  value  0  may  be  deflated. 

Rewriting  (2.2.1)  as 

AVIV  =  VWWtHW  +  fcTkW, 


and  using  both  (6.2.1)  and  (6.2.2)  and  partitioning  we  obtain 


(6.2.4) 


AVW 


vw 


0  hT 
0  H 


+  fel  +  fwT. 


Equation  (6.2.4)  is  not  an  Arnoldi  factorization.  The  matrix  II  of  order  k  —  1  needs  to 
be  returned  to  upper  Hessenberg  form.  Care  must  be  taken  not  to  disturb  the  matrix 
fej  and  the  first  column  of  WTHW.  To  start  the  process  we  compute  a  Householder 
matrix  Hfi  such  that 


M  g 
ftkt'l-2  7 


with  e:[_1Wi  =  cf_1 .  The  above  idea  is  repeated  resulting  in  Householder  matrices 
Wi,  W2, .  •  • ,  H4-3  that  returns  H  to  upper  Hessenberg  form.  Defining 


W  = 


1  0 

0  WXW2  -  ■  ■  Wk-3 


it  follows  by  the  construction  of  the  Wj  that  eJ\V  —  el 


and 


(6.2.5) 


WTWTHWWe1  =  6a. 


The  process  of  computing  a  similarity  transformation  as  in  equation  (6.2.5)  is  not 
new.  Wilkinson  discusses  the  more  general  notion  of  deflating  with  invariant  sub¬ 
spaces  in  §  20-25,  Chapter  9  in  [101].  Wilkinson  also  references  the  work  of  Feller 
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and  Forsythe  [31]  who  appear  to  be  the  first  to  use  elementary  Householder  trans¬ 
formations  for  deflation.  Problem  7.4.8  of  [35]  addresses  the  case  when  working  with 
upper  Hessenberg  matrices.  What  appears  to  be  new  is  the  application  to  the  Arnold! 
factorization  for  converged  Ritz  values. 

Since  ||/u;TW||  =  ||/||  ||WTw||  =  ||/||  \\w\\,  the  size  of  \\fwT\\  remains  the  un¬ 
changed.  Making  the  updates 

V  <-  VWW,  H  <-  WTWTHWW ,  wT  <-  ivTW 

we  obtain  the  relation 

(6.2.6)  AV  =  VH  +  feTk  +  fwT. 

A  deflated  Arnoldi  factorization  is  obtained  from  (6.2.6  )  by  discarding  the  term  }wT . 

The  following  theorem  shows  that  the  deflated  Arnoldi  factorization  resulting  from 
this  scheme  is  an  exact  length  k  factorization  for  a  nearby  matrix. 

Theorem  6.1  Let  an  Arnoldi  factorization  of  length  k  be  given  by 

(6.2.6)  where  Hs  =  s0  and  ||/||  <  e|[ A|[  for  t:  >  0.  Then  there 

exists  a  matrix  E  €  RnXn  such  that 

(6.2.7)  (A  T  E)V  =  VH  +  feTk, 
where  \\E\\  <  c||A||. 

Proof  Subtract  fwT  from  both  sides  of  equation  (6.2.6).  Set  E  —  —f(Vw)T  and 
then 


EV  =  -f(Vw)TV  =  -  fwT , 

and  equation  (6.2.7)  follows.  Using  Lemma  6.1  it  follows  that  ||J5||  =  ||/||  ||iw||  = 

C2|ePl  ll/H  <  4A\\-  a 

If  A  is  symmetric  then  the  choice  E  =  —f(Vw)T  —  ( Vw)fT  results  in  a  symmetric 
perturbation.  If  e:  is  on  the  order  of  unit  roundoff  then  the  deflation  scheme  introduces 
a  perturbation  of  the  same  order  to  those  already  present  from  computing  the  Arnoldi 
factorization  in  floating  point  arithmetic. 

Once  a  converged  Ritz  value  0  is  deflated,  the  Arnoldi  vector  corresponding  to 
0  is  locked  or  purged  as  described  in  the  previous  section.  The  only  difficulty  that 


78 


remains  is  decoupling  the  R.itz  vector  corresponding  to  the  Ritz  value  0,  or  purging, 
from  the  trailing  factorization  when  A  is  nonsymmetric. 

If  A  is  not  symmetric  then  the  Ritz  pair  may  not  purged  immediately  because  ol 
the  presence  of  h.  A  standard  reduction  of  H  to  block  diagonal  form  is  used.  11  0  is 
not  an  eigenvalue  of  H:  then  we  may  construct  a  vector  z  6  R'1  so  that 


(6.2.8) 


Solving  the  linear  system 

(6.2.9)  {HT-6Ik_l)z  =  h, 


determines  z.  Define 


Post  multiplication  of  equation  (6.2.6)  by  Z  results  in 


AVZ  =  VZ 


+  fel  +  fwTZ, 


since  e{Z  =  ej.  Equating  the  last  k  -  1  columns  of  the  previous  expression  results  in 

T  "1  „T  '  ZT 

(6.2.10)  AV  f  =  V  *  H  +  fe1h_1  +  fwT 

h-i  \  [  Ik~l  L  lk~l  . 

Compute  the  factorization  (using  k  —  1  Givens  rotations) 

(6.2.11)  QR  = 


where  Q  €  R^-1  with  QTQ  -  Ik- i  and  R  is  an  upper  triangular  matrix  of  order 
k  —  1.  Since  the  last  k  —  1  columns  of  Z  are  linearly  independent,  R  is  nonsingular. 
Post  multiplying  equation  (6.2.10)  by  Rr1  gives 

(6.2.12)  AVQ  =  VQRlnr1  +  Pk-J'4-1  + 

where  pk-\  =  c^Rth- 1-  The  last  term  fwTQ  in  (6.2.12)  is  discarded  by  the  deflation 
scheme  and  this  relation  shows  that  the  discarded  term  is  not  magnified  in  norm  by 
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the  purging  procedure.  The  matrix  RHR  1  remains  upper  Hessenberg  since  R  is 
upper  triangular.  Partitioning  Q  conformably  with  the  right  side  ot  equation  (6.2.11) 

results  in 


T 

'111 

Q‘21 

R.  = 

z 

and  it  follows  that  Rr1  =  Qn-  Using  the  Cauchy-Schwarz  inequality  it  follows  that 
| p~lx\  =  \e{_lQ2iek-i\  <  1  and  hence  the  Arnoldi  residual  is  not  amplified  by  the 
purging.  The  final  purged  Arnoldi  factorization  is 

(6.2.13)  AVQ  =  VQRHQii  +  piliM-i- 

The  similarity  transformation  that  produces  the  new  upper  Hessenberg  matrix 
does  not  affect  the  eigenvectors  and  thus  the  Ritz  estimates.  Since  the  Ritz  estimates 
are  just  the  residuals  of  the  Ritz  pairs  which  are  determined  by  A  and  the  R{V),  the 
similarity  transformation  performed  on  H  through  R  does  not  affect  the  Ritz  pairs. 
Only  the  basis  representation  of  the  1Z(V)  is  modified  so  that  we  may  decouple  and 
discard  an  unwanted  Ritz  pair. 

Performing  the  set  of  updates 

V<-VQ,  H  <-  RHQ21, 

defines  equation  (6.2.13)  as  a,n  Arnoldi  factorization  ol  length  k  —  1.  Theorem  6.1 
implies  this  is  an  Arnoldi  factorization  for  a  nearby  matrix.  It  is  easily  verified  that 
VTf(el_1  +  wT)  =  0  and  that  H  is  a.11  upper  Hessenberg  matrix  of  order  k  -  1. 

6.3  A  Practical  Deflating  Procedure 

The  practical  issues  associated  with  a  numerically  stable  deflating  procedure  are  ad¬ 
dressed  in  this  section.  These  include: 

1.  Performing  the  deflation  in  real  arithmetic  when  a  converged  Ritz  value 
has  a  non-zero  imaginary  component. 

2.  Deflation  with  more  than  one  converged  Ritz  value. 

3.  Error  Analysis. 

Section  6.3.2  presents  two  algorithms  that  implement  the  deflation  schemes.  The 
error  analysis  of  the  two  deflation  schemes  is  presented  in  the  next  section. 
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6.3.1  Deflation  with  Real  Arithmetic 

Suppose  s  =  t  +  iu  and  0  =  u  +  i/i  is  an  eigenpair  of  H  where  t  and  u  are  unit  vectors 
in  Rfc,  H  €  Rkxk  and  +  0.  Thus 


Factor 

(6.3.1)  [  t  u  =  Q 

where  QTQ  =  h  and  R  is  an  upper  triangular  matrix.  It  is  easily  shown  that  t  and 
u  are  linearly  independent  as  vectors  in  Rfc  since  ji  ^  0  and  the  non-singularity  ol  R 
follows.  Performing  a  similarity  transformation  with  Q  on  t  u  gives 


'  RCR~l 

0 


Suppose  that  H  corresponds  to  an  Arnoldi  lactorization  ol  length  k  and  that 
0(e)  =  \t\u\.  In  order  to  deflate  the  complex  conjugate  pair  of  eigenvalues  from  the 
factorization  in  an  implicit  manner,  we  require  that  e{Q  —  ef  +  qT  where  \\q\\  =  0(e). 

We  now  show  that  the  magnitudes  of  the  last  components  of  t  and  u  are  not 
sufficient  to  guarantee  the  required  form  for  Q.  Suppose  that  u  =  t  cos  (j>  +  r  sin  0 
where  r  is  a  unit  vector  orthogonal  to  t  and  measures  the  positive  angle  between  t 
and  u.  Lemma  6.1  allows  a  Householder  W\  matrix  such  that 

W[  [  t  u  ]  =  [  Ciei>Ciei  cos^-l-  Wfr  sin  <f>  ]  = 

where  (f  =  ±1  and  the  last  column  and  row  of  W\  and  h  are  order  eft  equiva¬ 
lent.  To  compute  the  required  orthogonal  factorization  in  equation  (6.3.1)  another 

Householder  matrix  W2  =  ,  is  needed  so  that  Wjii  —  ±||h||ei-  But. 

0  W-2 

Lemma  6.1  only  results  in  e/[_1W2  =  ef_x  +  wj  with  ]|  =  O(e)  if  e^jt/  is  small 
relative  to  ||u||.  Unfortunately,  if  (j)  is  small,  W-[ u  ~  Ci ei  an<l  l|h||  ~  <t>  ~  0.  Hence  we 
cannot  obtain  the  required  form  for  Q  =  WiW2. 

Fortunately,  when  t  and  u  are  nearly  aligned,  //.  may  be  neglected  as  the  following 
result  demonstrates. 
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Lemma  6.2  Let  H{t  +  iu )  =  (u  +  t»(f  +  iu)  where  t  and  u  are  unit 
vectors  in  Rk,  H  6  Rkxk  and  //  ^  0.  Suppose  that  </>  measures  the  positive 
angle  between  t  and  u.  Then 

(6.3.2)  M  <  sin  </>||iL||. 

Proof  Let  u  =  /.  cos  (f)  +  r  sin  <j>  where  r  is  a  unit  vector  orthogonal  to  t  and  <j> 
measures  the  positive  angle  between  t  and  u.  Equating  real  and  imaginary  parts  of 
H{t  +  iu)  =  (v  +  +  iu)  results  in  Ht  =  tv  —  ufi  and  Hu  =  tfi  +  uv.  The  desired 

estimate  follows  since 

2/i  =  tTHu  -  uT Ht  =  (tT Hr  -  r1  Ht)  sin  <f>, 

results  in  |/x|  <  sin  </>||//||.  1=1 

For  small  <f>,  t  and  u  are  almost  parallel  eigenvectors  of  H  corresponding  to  a 
nearly  multiple  eigenvalue.  Numerically,  we  set  //.  to  zero  and  deflate  one  copy  of  v 
from  the  Arnoldi  factorization. 

A  computable  bound  on  the  size  of  the  angle  <j>  is  now  determined  using  only 
the  real  and  imaginary  parts  of  the  eigenvector.  The  second  Householder  matrix  W2 
should  not  be  computed  if 

(6.3.3)  >  INI  144 

Recall  that  Lemma  6.1  gives  e^ITi  =  e;f  +  wf  where  wf  =  7e4(Ciei  —  ^ )  an<l 
7  =  (1  +  lei^^|)— 1  -  Thus 

f|_j«  =  e£WTu  =  t\Wu  =  eju  +  uru, 
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as  our  computable  bound. 

Suppose  that  HX  =  XD  where  X  £  R and  D  is  a.  quasi-diagonal  matrix.  The 
eigenvalues  of  H  are  on  the  diagonal  of  D  if  they  have  zero  imaginary  component 
and  in  blocks  of  two  for  the  complex  conjugate  pairs.  The  columns  ol  X  span  the 
eigenspac.e  corresponding  to  diagonal  values  of  D .  For  the  blocks  of  order  two  on  the 
diagonal  the  corresponding  complex  eigenvector  is  stored  in  two  consecutive  columns 
of  X,  the  first  holding  the  real  part,  and  the  second  the  imaginary  part.  It  we  want  to 


block  deflate  X,  where  the  last  row  is  small,  from  H ,  then  we  could  proceed  as  follows. 

R 

Compute  the  orthogonal  factorization  X  =  Q 


via  Householder  reflectors  where 


QtQ  =  Ik  and  R  £  Rfcxfc  is  upper  triangular.  Then  the  last  row  and  column  of  Q 
differ  from  that  of  R  with  terms  on  the  same  order  of  the  entries  in  the  last  row  of  X  if 
the  condition  number  of  R  is  modest.  Theorem  6.4  makes  this  last  statement  precise. 
Thus  if  the  columns  of  X  are  not  almost  linearly  dependent,  an  appropriate  Q  may 
be  determined.  Finally,  we  note  that  when  H  is  a  symmetric  tridiagonal  matrix,  an 
appropriate  Q  may  always  be  determined. 


6.3.2  Algorithms  for  Deflating  Converged  Ritz  Values 


The  two  procedures  presented  in  this  section  extend  the  ideas  of  §  6.1  to  provide 
deflation  of  more  than  one  converged  Ritz  value  at  a  time.  The  first  purges  the 
factorization  of  the  unwanted  converged  Ritz  values.  The  second  locks  the  Arnoldi 
vectors  corresponding  to  the  desired  converged  Ritz  values.  When  both  deflation  al¬ 
gorithms  are  incorporated  within  an  iRA-iteration,  the  locked  vectors  form  a  basis  for 
an  approximate  invariant  subspace  of  A.  This  truncated  factorization  is  an  approx¬ 
imate  partial  Schur  decomposition.  When  A  is  symmetric,  the  approximate  Schur 
vectors  are  Ritz  vectors  and  the  upper  quasi-triangular  matrix  is  the  diagonal  matrix 
of  Ritz  values. 

Partition  a  length  m  Arnoldi  factorization  as 


(6.3.5) 


r  - 

’  H, 

M, 

'  Vi  Vm-j  ' 

— 

[  Vi  Vm-3 

J 

0 

J 

Hjii—j 

+  Ul  +  fwT, 


where  Hj  and  Hm-j  are  upper  quasi-triangular  and  unreduced  upper  Hessenberg 
matrices,  respectively.  The  matrix  Hj  £  R?XJ  contains  the  wanted  converged  Ritz 
values  of  the  matrix  Hm.  The  columns  of  Vj  £  RnXj  are  the  locked  Arnoldi  vectors 
that  represent  an  approximate  Schur  basis  for  the  invariant  subspace  of  interest.  The 


83 


matrix  Hm-j  designates  the  trailing  sub-matrix  of  order  m  —  j.  Analogously,  the  last 
m  —  j  columns  ol  Vm  are  denoted  by  Vm~j.  We  shall  refer  to  the  last  m  —  j  columns 
of  (6.3.5)  as  the  active  part  of  the  factorization.  Finally,  Mj  G  R?Xm_i  denotes  the 
sub-matrix  in  the  north-east  corner  ol  Hm.  Figure  6.1  illustrates  the  matrix  product 
VmHm  of  equation  (6.3.5). 

If  A  is  symmetric  the  two  deflation  procedures  simplify  considerably.  In  fact, 
purging  is  only  used  when  A  is  nonsymmetric  lor  otherwise  Mj  =  0 jum-j  and  both 
Hj  and  Hm-j  are  symmetric  tridiagonal  matrices.  Both  algorithms  are  followed  by 
remarks  concerning  some  ol  the  specific  details. 

Algorithm  6.2 

function  [Urn  HJn,  fm]  =  Lock  (V)(l,  /m,  A,,  ^) 

INPUT:  A  length  m  Arnoldi  factorization  AVm  —  VrnHm  +  The 

first  j  columns  of  Vm  represent  an  approximate  invariant  subspace  for 
A.  The  leading  principal  sub-matrix  Hj  ol  order  j  ol  Hm  is  upper  quasi- 
triangular  and  contains  the  converged  Ritz  values  of  interest.  The  columns 
of  Xi  G  R7'l_-?Xl  are  the  eigenvectors  corresponding  to  the  eigenvalues  that 
are  to  be  locked. 

OUTPUT:  A  length  m  Arnoldi  factorization  defined  by  Vm,  H,u  and  frn 
where  the  first  j  +  i  columns  of  Vm  are  an  approximate  invariant  subspace 
for  A. 


1.  Compute  the  orthogonal  factorization 


Q 


Ri 


0 


rn—j  —  i 


where  Q  G  Rm  • 7Xm  3  using  Householder  matrices  ; 

2.  Update  the  factorization 

Hm-j  <—  QT Hm-jQ  ;  Vm-j  <—  Vm-jQ  ;  Mj  *—  M:i Q  ; 

3.  Compute  an  orthogonal  matrix  P  G  R m-j-txm-j-t  usjug  Householder 

matrices  that  restores  to  upper  Hessenberg  form  ; 


4.  Update  the  factorization 


H-in—j  —  i 


PTHm-j-iP  ;  Vm-j 


U, 


m—j—i 


iP  ;  Mj+i  *-  Mj+iP  ; 
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□  Locked  Vectors 


I  I  Active  Factorization 


Figure  6.1  The  matrix  product  Vln  H,n  of  the  factorization  upon  entering 
Algorithm  6.2  or  6.3.  The  shaded  region  corresponds  to  the  converged 

portion  of  the  factorization. 
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Line  1  computes  an  orthogonal  basis  for  the  eigenvectors  of  Hm-j  that  correspond 
to  the  Ritz  estimates  that  are  converged.  The  matrix  of  eigenvectors  in  line  1  satisfies 
the  equation  =  X{D{  where  Dt  is  a  quasi-diagonal  matrix  containing  the 

eigenvalues  to  be  locked.  From  the  §  6.3.1,  we  see  that  the  leading  sub-matrix  of 
QTHm-jQ  of  order  i  is  upper  quasi-triangular.  The  required  relation  eJrtQ  =  efn  +  qr , 
with  \\q\\  small  is  guaranteed  if  the  condition  number  of  Hi  is  modest.  Since  i  is 
typically  a  small  number,  we  compute  the  condition  number  of  /?;.  The  number  of 
vectors  to  be  locked  is  assumed  to  be  such  that  the  condition  number  of  Bi  is  small. 
In  particular,  if  Hm  is  a  symmetric  tridiagonal  matrix,  Q  always  has  the  required 
form.  Lines  3-4  return  the  updated  Hm-j  to  upper  Hessenberg  form. 

Before  entering  Purge,  the  unwanted  converged  Ritz  pairs  are  placed  at  the  front 
of  the  factorization.  A  prior  call  to  Lock  places  the  unwanted  values  and  vectors 
to  the  beginning  of  the  factorization.  Unlike  Lock,  the  procedure  Purge  requires 
accessing  and  updating  the  entire  factorization  in  the  nonsymmetric  case.  Thus,  for 
large  scale  nonsymmetric  eigenvalue  computations,  the  amount  purging  performed 
should  be  kept  to  a  minimum. 

Algorithm  6.3 

function  [Lrrt— i ,  1 1  Jrn— t]  Purge  ( ,  H1a ,  f1n ,  J ,  l j 

INPUT:  A  length  m  Arnoldi  factorization  AVm  =  VmHm  +  fmejn.  The 
first  i  +  j  columns  of  Vm  represent  an  approximate:  invariant  subspace  for 
A.  The  leading  principal  sub-matrix  Hi+j  of  order  i  +  j  of  Hm  is  upper 
quasi-triangular  and  contains  the  converged  Ritz  values.  The  i  unwanted 
converged  eigenvalues  are  in  the  leading  portion  of  Hi+j.  The  converged 
complex  conjugate  Ritz  pairs  are  stored  in  2x2  blocks  on  the  diagonal 
of  Hi+J. 

OUTPUT:  A  length  m  —  i  Arnoldi  factorization  defined  by  Vvi-i,  Hm-i 
and  fm-i  purged  of  the  unwanted  converged  Ritz  values  and  corresponding 
Schur  vectors. 

Lines  1-3  purge  the  factorization  of  the  unwanted  converged  Ritz 
values  contained  in  the  leading  portion  of  Hin  ; 

1.  Solve  the  Sylvester  set  of  equations, 


=  Mi 


Z  H,,,-,  —  H,Z 
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Vectors  to  be  Purged 


CD  Locked  Vectors 


□ 


Active  Factorization 


Figure  6.2  The  matrix  product  VmHm  of  the  factorization  just  prior  to 
discarding  in  Algorithm  6.3.  The  darkly  shaded  regions  may  now  be  dropped 

from  the  factorization. 
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for  Z  6  R,Xm  '  that  arise  from  block  diagonalizing  Hm  ; 


'  / 

z 

'  Ii  z 

'  Ht 

^in-i 

I m—i 

H,n  _  ^ 

2.  Compute  the  orthogonal  factorization 


QB„ 


Qi 

Qm—i 


Bn—1 


where  Q  €  RmXm  *  using  Householder  matrices  ; 

3.  Update  the  factorization  and  obtain  a  length  m  —  i  factorization  ; 

Hrn  —  i  <  7?m — t Hm —iQ  1H — i  ,  Vm — i  <  Um Q  )  fm—i  *  Pm—i,Tn—ifm  i 
wheie  Pm— i,rr»— i  ^rri— {Bin— i^rn— i  1 


At  the  completion  of  Algorithm  6.3  the  factorization  is  of  length  m  —  i  and  the 
leading  sub-matrix  of  order  j  will  be  upper  quasi-triangular.  The  wanted  converged 
Ritz  values  will  either  be  on  the  dia.gona.1  if  real  or  in  blocks  of  two  for  the  complex 
conjugate  pairs.  Figure  6.2  shows  the  structure  of  the  updated  VmHm  just  prior  to 
discarding  the  unwanted  portions. 

The  solution  of  the  Sylvester  equation  at  line  1  determines  the  matrix  Z  that 
block  diagonalizes  the  spectrum  of  H7n  into  two  sub-matrices.  The  unwanted  portion 
is  in  the  leading  corner  and  the  remaining  eigenvalues  of  Hm  are  in  the  other  block. 
A  solution  Z  exists  when  the  Hi  and  Hm-i  do  not  have  a  common  eigenvalue.  If  there 
is  an  eigenvalue  is  shared  by  Hi  and  Hm-i,  then  Hm  has  an  eigenvalue  of  multiplicity 
greater  than  one.  The  remedy  is  a  criterion  that  determines  whether  to  increase  or 
decrease  i ,  the  number  of  Ritz  values  that  require  purging.  Analysis  similar  to  that 
in  section  6.2  demonstrates  that  after  line  3  the  Ritz  estimates  for  the  eigenvalues 
of  Hm-i  are  not  altered.  We  also  remark  that  B,n-X  is  nonsingular  since  the  matrix 

is  of  full  column  rank  and  that  \p~^_l  m_,j  <  1. 


Z 

m—i 


6.4  Error  Analysis 

This  section  examines  the  numerical  stability  of  the  two  deflation  algorithms  when 
computing  in  finite  precision  arithmetic.  A  stable  algorithm  computes  the  exact 
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solution  of  a  nearby  problem.  It  will  be  shown  that  Algorithms  6.3  and  6.2  deflate 
slightly  perturbed  matrices. 


For  ease  of  notation  H  = 


replaces  Hni  €  Rmxm  used  by  procedures 


Hn  h12 

Hn  H22 

Lock  and  Purge  of  §  6.3.2.  The  sub-matrix  Hn  is  of  order  i  and  H2\  is  zero  except  for 
the  sub-diagonal  entry  of  H  located  in  the  north-east  corner.  Analogously,  H  repre¬ 
sents  H  after  the  similarity  transformation  performed  by  Lock  or  Purge,  partitioned 
conformably. 


6.4.1  Locking 

The  locking  scheme  is  considered  successful  if  the  desired  eigenvalues  end  up  in  Hu 
and  H'2i  is  small  in  norm.  The  largest  source  of  error  is  from  computing  an  orthogonal 
factorization  from  the  approximate  eigenvector  matrix  containing  the  vectors  to  be 
locked. 

The  matrix  pair  ( X ,  D )  represents  an  approximate  quasi-diagonal  form  for  H.  The 
computed  columns  of  X  span  the  right  eigenspa.ee  corresponding  to  diagonal  blocks 
of  D.  We  assume  that  X  is  a  non-singular  matrix  and  that  each  column  is  a  unit 
vector. 

Standard  results  give  \\XD  —  HX\\  <  ei||#||  where  ej  is  a  small  multiple  of  ma¬ 
chine  precision  for  a  stable  algorithm.  Defining  the  matrix  E  =  ( X D  —  H X)Y 1  where 
X~l  =  YT  it  follows  that  ( H  +  E)X  =  XD.  If  ( X )  is  the  smallest  singular  value 
of  X  then  ||A’“1||  =  a~^(X).  Since  each  column  of  X  is  a  unit  vector,  ||X||  <  y/rn. 
If  k(X)  =  ||  A|| || A-1 1|  is  the  condition  number  for  the  matrix  of  approximate  eigen¬ 
vectors,  ||JS||  <  ei«(X)||/f||.  If  X  is  a  well  conditioned  matrix  then  the  approximate 
quasi-diagonal  form  for  H  is  exact  for  a  nearby  matrix.  In  particular,  if  II  is  sym¬ 
metric  then  E  is  always  a  small  perturbation.  As  the  columns  of  X  become  linearly 
dependent,  am(X)  decreases  and  E  may  represent  a  large  perturbation. 

The  following  result  informs  us  that  locking  is  a,  conditionally  stable  process. 


Theorem  6.4 

Let  H  G  RmXm 
eigenvalues.  Suppose  that  X  —  \  X\  X2 


be  an  unreduced  upper  Hessenberg  matrix  with  distinct 

Dx  0 


and  D  - 


0  D2 


are  an 


approximate  quasi-diagonal  form  for  H  that  satisfies  (II  +  E)X  —  XD 
where  Ills'll  <  eiK(X)\\H\\.  Let  Q\B.\  =  Xx  <E  RmXj  where  QfQi  =  I3. 
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Suppose  the  QR  factorization  of  X1  is  computed  so  that  QR  =  X\  +  E 
where  QTQ  =  Im  and  ||£j|  <  e2||Xi||.  Both  t\  and  e2  are  small  multi¬ 
ples  of  the  machine  precision  ejvf-  Let  e  =  max(ei,2e2)  and  let  k(R i)  = 
1|  jRi  ||  || R,^1 1|  be  the  condition  number  for  Ri  where 


_ (R i 

1  —  €-2k(Ri) 


If  Tj  =  e(/c(^)  +  e/r(l  +  e/i«:(f?i)))  <  1  then  there  exists  a  matrix  C  €  RmXm 
such  that 

Qt(H  -  C)Q  =  H  =  f  H"  H.u  1  , 

U  h22 

where  H\\  is  an  upper  quasi-triangular  matrix  similar  to  D\  and 
(6.4.1)  HCII  <  e(K(X)  +  ll)\\H\\  +  0(e2). 


A  few  remarks  are  in  order. 

1.  If  H  is  symmetric  Hi2  —  0  and  Hu  iy  diagonal.  Procedure  Lock  is  stable 
since  noted  previously,  k(X)  =  1  and  fi  «  1.  Parlett  [61.  pages  85-86]  proves 
Theorem  6.4  for  symmetric  matrices  when  locking  one  approximate  eigenvector. 

2.  If  only  one  column  is  locked,  then  //  =  1  +  0(e)  and  ||C||  is  small  relative  to 
k(X)\\H\\. 

3.  If  k(R. i)  is  large,  the  columns  of  Xi  are  nearly  dependent.  In  this  case,  k(X)  will 
also  be  large  and  locking  introduces  no  more  error  into  the  computation  than 
already  present  from  computing  the  quasi-diagonal  pair  (X,  D).  The  factor  of 
fi  may  be  minimized  by  decreasing  j  the  number  of  columns  locked. 

4.  A  conservative  strategy  locks  only  one  vector  at  a  time.  The  only  real  concern 
is  when  locking  two  vectors  corresponding  to  a  complex  conjugate  pair.  If  the 
real  and  imaginary  part  of  the  complex  eigenvector  are  nearly  aligned,  //,  will 
be  large  and  locking  may  be  unstable.  But  as  §  6.3.1  explains,  the  complex 
conjugate  pair  may  be  numerically  regarded  as  a  double  eigenvalue  with  zero 
imaginary  part.  Only  one  copy  is  deflated  and  //  «  1. 
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Proof 

Partition  X 


Xx  X2 


and  D  = 


for  the  right  eigenspace  to  be  locked  and 
We  assv 

Let  YT 


.  The  i  columns  of  Xx  are  a  basis 


Dx  0 
0  D2 

D i  contains  the  corresponding  eigenvalues. 
We  assume  that  the  eigenvalues  of  D\  and  D2  are  distinct  and  that  X  is  non-singular. 

"  yT  1 

T  denote  the  inverse  of  X.  The  rows  of  Y±  span  the  left  eigenspace 

Y2 

associated  with  Xx  and  Dx. 

Let  the  product  QR  be  an  exact  QR  factorization  of  a  matrix  near  Xx : 

Ri 
0 


QR  —  Q\  Q2 


=  Xx  +E, 


where  ||£||  <  e2||Ai||.  Using  Theorem  1.1  of  Stewart  [89],  since  WR^  1  ||||i?||  <  rj  <  1 
there  exists  matrices  Wx  6  Rmx,’_and  F\  €  R^-7  such  that  ((5i  +  U/i)(f?i-f  Fx)  =  Q\R\ 

"  Ri  " 


where  QR  =  Qx  Q2 


F  = 


Fi 

0 


=  AT  and  (Qx  +  W1)T(Q1  +  Wi)  =  Ij.  Define 


and  W  —  Wx  0  .  The  matrices  W  and  F  are  the  perturbations  that 


account  for  the  backward  error  E  produced  by  computation. 
Partitioning  W  conformably  with  Q  gives 

qthq  =  qtxdytq-qteq, 

=  Qt(X1D1Y1t  +  X2D2Y2t)Q-QtEQ, 

Ql 


(6.4.2) 


Ql 


(AT /TW  +  X2D2  Y2  )  Q1  Q2  + 


Wt(X1D1Y1t  +  X2D2Y2t)  qx  q2 


+ 


Ql 

Ql 


(AT DXY?  +  X2D2Y2)W  -  QtEQ , 


where  the  second-order  terms  involving  W  are  ignored.  From  the  decomposition 
X\  —  QXRX  it  follows  that  Qx  =  XxR,1 1  which  gives  QlXx  =  0.  The  equality 
YT  =  A-1  implies  that  Y^Xi  =  I  for  l  =  1,2  and  Yj Xx  =  0  =  Y^ X2  and  hence 

y2tqx  =  0. 

Using  these  relationships,  equation  (6.4.2)  becomes 

RiDxR^  QJXDYtQ2 

0  QlX2D2Y2TQ2 


(6.4.3) 


QtHQ  = 


+  C, 


91 


(6.4.4)  =  H  +  C, 

where  the  matrix  C  absorbs  the  three  matrix  products  involving  W  or  E  on  the  right 
hand  side  of  equation  (6.4.2).  We  note  that  if  H  is  symmetric,  QjX 2  =  0  =  Y^Q 2,  R\ 
is  a  diagonal  matrix  and  hence  R^DiR.^1  =  D 1.  Thus  H  is  also  a  symmetric  matrix. 
Defining  C  =  QCQT  equation  (6.4.4)  is  rewritten  as  QT{H  —  C)Q  —  H.  Since 
QH  =  (X\D\Yi  +  X2D2Y.J )Q  and  using  the  definition  of  C  from  equation  (6.4.2), 

(6.4.5)  C  =  WtQH  +  QtWH  -QtEQ, 

it  follows  that  ||C7||  <  2||14rr(5||||hr||  +  ||£?||.  The  result  of  Theorem  1.1  of  Stewart  [89] 
also  allows  the  estimate 

\\WtQ\\  <  Ill'll  <  c2/i(l  +  ej/Kcfft)), 

where  0(t3)  terms  are  ignored.  For  modest  values  of  //,  W  is  numerically  orthogonal 
to  Q.  From  equation  (6.4.5) 

11^11  =  116% 

<  2e2/x(l  +  e2flK(R.1))\\H\\  +  c1ac(X)||//||, 

<  2e2/i(l  +  e2/^(i?1))(ll^ll  +  11^11)  +  fl4Y)||tf  ||, 

<  e(K(X)  +  /t(l  +  e/<#c(A1)))||J7||  +  c/i(l  +  e/^(i?1))||C7||, 

=  ^m  +  ^||e||, 

where  the  second  inequality  uses  equation  (6.4.4).  Since  fj  <  tj,  rearranging  the  last 
inequality  gives  ||6'||(1  -  rj)  <  r]\\H\\.  Ignoring  0(rj 2)  terms  He'll  <  ri\\H\\.  The 
estimate  on  the  size  of  C  in  equation  (6.4.1)  now  follows  since  ?/  =  t(n(X)  +  //(l  + 
efin(X)))  <  t(K.(X)  +  //,)  +  0(t2).  □ 

6.4.2  Purging 

The  success  of  the  purging  scheme  depends  upon  the  solution  of  the  Sylvester  set  of 
equations  required  by  Algorithm  6.3.  We  rewrite  the  Sylvester  set  of  equations  in 
Algorithm  6.3  as  ZH2 2  —  HnZ  =  Hi2.  The  job  is  to  examine  the  effect  of  performing 
the  similarity  transformation  RII22FX1  where 
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The  last  relation  implies  that  Br1  =  Q^' .  In  actual  computation,  this  equality  obviates 
the  need  to  solve  linear  systems  with  B  necessary  for  the  similarity  transformation. 
For  the  error  analysis,  that  follows  i?-1  is  used  in  a  formal  sense. 

Let  Z  be  the  computed  solution  to  the  Sylvester  set  of  equations.  In  a  similar 
analysis,  Bai  and  Demmel  [9]  assume  that  the  QB  factorization  of  S  is  performed 

exactly  and  we  do  also.  The  ma  jor  source  of  error  is  that  arising  from  computing  Z. 
A  A  7 

Suppose  that  QB  =  =  S.  Write  Z  —  Z  +  E  where  E  is  the  error  in  Z .  If 

QB  —  S  and  ||i?.-1||||.£?||  <  1,  then  Theorem  1.1  of  Stewart  [89]  gives  matrices  W  and 
F  such  that  (Q  +  W)(B.  +  F)  —  QB  where  ( Q  -f  W)T(Q  +  W)  =  Im.  The  result  gives 
the  bound  ||Fj|  <  ||i?||||£;||  +  0(||U||2).  Up  to  first  order  perturbation  terms, 

BH22B~l  =  (B  +  F)H22(B  +  F)~l  =  BH22B~ 1  +  BH22B~1FB~1  +  FH22B~\ 

Defining  the  error  matrix  C  =  H22B~l  F  +  B~lFH22  it  follows  that 

b.h22b-x  =  B,(H22  +  C)B,-\ 

Ignoring  second-order  terms,  we  obtain  the  estimate 

lie'll  <  2||JR-1||||^||||i/22||  <  2K(5’)||i?||||i/22||. 


The  invariance  of  ||  •  ||  under  orthogonal  transformations  gives  k(S)  =  ||i?-1||||ir,||. 
Since  the  singular  values  of  S  are  the  square  roots  of  the  eigenvalues  of  STS  it  follows 
that 


<S) 


1 

1  +  <u(Z) 


where  crmax(Z)  and  crmax(Z)  are  the  largest  and  smallest  singular  values  of  Z.  Since 
ZTZ  is  a  symmetric  positive  semi-definite  matrix,  A max(ZT Z)  =  ||Z||2,  and  then 
k(S)  <  \Jl  +  ||Z||2,  with  equality  if  zero  is  an  eigenvalue  of  ZT Z. 

The  previous  discussion  is  summarized  in  the  following  result. 


Theorem  6.5  Let  Z  be  the  computed  solution  to  the  Sylvester  set  of 
equations,  ZH22  —  HUZ  =  i/12,  where  the  eigenvalues  of  II n  and  H22 
are  distinct.  Let  Z  =  Z  +  E  where  E  is  the  error  in  Z  and  suppose  that 

|ji?-1 1| ||£j|  <  1  where  QB.  =  ^  . 
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Then  there  exists  a  matrix  C  such  that 

RH2Jrl  =  R(H22  +  C)R~\ 

where 

(6-4.6)  ||C||  <  2/1  +  \\zr  Ml  \\H\l 


If  ||  £'||  is  a  modest  multiple  of  machine  precision  and  the  solution  of  the  Sylvester’s 
equations  is  not  large  in  norm,  then  purging  is  backward  stable  since  ||C||  is  small 
relative  to  ||f/||. 

The  two  standard  approaches  [11,  36]  for  solving  Sylvester’s  equation  show  that 
IIT’IIf  <  e3( || //11  Hf’  +  || ^22 ||f) || ||f  where  F  =  H\2  —  ZH22  +  HUZ  and  e3  is  a 
modest  multiple  of  machine  precision.  Standard  bounds  [18,  35]  also  give  \\Z\\f  < 
sep-'tHn,  H22)\\Hi2\\f  where 


(  u  u  \  _  •  11*^22  -  ^h^IIf 

SeP(H„,Hu)  =  mm - - , 

is  the  separation  between  £Tn  and  H22.  Although 


sep (HU,H22)  <  inin|Afc(tfn)  -  A/(tf22)|, 

k,l 

Varah  [94]  indicates  that  il  the  matrices  involved  are  highly  non-normal,  the  smallest 
difference  between  the  spectrums  ol  II n  and  II 22  may  be  an  over  estimate  of  the  actual 
separation.  Recently,  Higham  [40]  gives  a  detailed  error  analysis  for  the  solution  of 
Sylvester’s  equation.  The  analysis  takes  into  account  the  special  structure  of  the 
equations  involved.  For  example,  Higham  shows  that  ||£||f  <  sep-1(#n,  -££22) H-^Hf 
but  this  may  lead  to  an  arbitrarily  large  estimate  of  the  true  forward  error.  For  use 
in  practical  error  estimation,  “LAPACK-style”  software  is  available. 

A  robust  implementation  of  procedure  Lock  determines  the  backward  stability  by 
estimating  both  ||Z||  and  ||£||. 


6.5  Other  Deflation  Techniques 

Wilkinson  [101,  pages  584-602]  has  given  a  comprehensive  treatment  of  various  de¬ 
flation  schemes  associated  with  iterative  methods.  Recently,  Saad  [78,  pages  117- 
125,180-182]  discussed  several  deflation  strategies  used  with  both  simultaneous  it¬ 
eration  and  Arnoldi’s  method.  Algorithm  6.2  is  an  in  place  version  of  one  of  these 


94 


schemes  [78,  page  181].  Saacl’s  version  explicitly  orthonormalizes  the  newly  converged 
Ritz  vectors  against  the  already  computed  approximate  j  Scliur  vectors.  This  is  the 
form  of  locking  used  by  Scott  [80].  Instead,  procedure  Lock  achieves  the  same  task 
implicitly  through  the  use  of  Householder  matrices  in  RmXm.  Thus  we  are  able  to 
orthogonalize  vectors  in  R"  at  a  reduced  expense  since  rn  <C  n. 

Other  deflation  strategies  include  the  various  Wielandt  deflation  techniques  [78, 
101].  We  briefly  review  those  tha.t  do  not  require  the  approximate  left  eigenvectors 
of  A  or  complex  arithmetic.  Denote  by  Aj , . . . ,  \j  the  wanted  eigenvalues  of  A.  The 
Wielandt  and  Schur— Wielandt.  forms  of  deflation  determine  a  rank  j  modification  of 

A, 


(6.5.1) 


Aj  =  A  —  UjSjUj, 


where  Sj  €  R,XJ  and  j  represents  the  dimension  of  the  approximate  invariant  sub¬ 
space  already  computed.  The  idea  is  to  choose  Sj  so  that  Aj  will  converge  to  the 
remainder  of  the  invariant  subspace  desired.  For  example,  Sj  is  selected  to  be  a 
diagonal  matrix  of  shifts  <t1,  ...  ,crj  so  that  Aj  has  eigenvalues  { —  rx  , , . . . ,  A  ,  — 
aj  1  ^j  +  l  )  •  •  ■  >  An}. 

Both  forms  of  deflation  differ  in  the  choice  of  Uj.  The  Wielandt  variant  uses 
converged  R.itz  vectors  while  the  Sc.hur-Wiela.ndt  uses  an  approximate  Schur  basis 
set  vectors.  With  either  form  of  deflation,  the  eigenvalues  of  Aj  are  A,  —  cq  for  i  <  j 
and  A,  otherwise  and  both  forms  leave  the  Schur  vectors  unchanged.  This  motivates 
Saad  to  suggest  that  an  approximate  Schur  basis  should  be  incrementally  built  as  Ritz 
vectors  of  Aj  converge.  Braconnier  [10]  employs  the  Wielandt  variant  and  discusses 
the  details  of  deflating  a,  converged  Ritz  value  that,  has  nonzero  imaginary  part  in  real 
arithmetic. 

We  now  compare  our  locking  scheme  to  the  Schur-Wielandt  deflation  techniques. 
We  shall  assume  that  AUj  =  UjR.j  is  a.  real  partial  Schur  form  of  order  j  for  A  and 
we  will  put  Sj  =  Rj  in  the  Schur-Wielandt  deflation  scheme.  Suppose  that 


(6.5.2) 


A  [  Uj  Vm 


=  Uj  Vm 


Rj  Mj 

0  Hr,,. 


+  fn 


rn+jem+j > 


is  a  length  m  +  j  Arnoldi  factorization  obtained  after  locking.  Consider  any  asso¬ 
ciated  roundoff  errors  as  being  absorbed  in  A  here.  Equate  the  last  m  columns  of 
equation  (6.5.2)  to  obtain 


AV„ ,  —  UjMj  +  VrnHm  4-  fm+j^m- 


(6.5.3) 
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Since  Uj  is  orthogonal  to  Vm,  it  follows  that  (/  —  UjUj)A(I  —  U3 Uj ) Vm  —  Vm Hrn  + 
fm+3ejn.  This  implies  that  the  Arnoldi  factorization  (6.5.2)  is  equivalent  to  apply¬ 
ing  Arnoldi’s  method  to  the  projected  matrix  (I  —  U3Uj)A(I  —  UjUj)  with  the  first 
column  of  Vm  as  the  starting  vector.  Keeping  the  locked  vectors  active  in  the  construc¬ 
tion  and  the  IRA  update  of  this  Arnoldi  factorization  assures  that  the  Krylov  space 
generated  by  Vm  remains  free  of  components  corresponding  to  locked  Ritz  values. 
The  appearance  of  spurious  Ritz  values  in  the  subsequent  factorization  is  automat¬ 
ically  avoided.  Note  that  when  A  is  symmetric,  this  is  equivalent  to  the  selective 
orthogonalization  [61,  pages275-284]  scheme  proposed  by  Parlett  and  Scott. 

In  contrast  to  locking,  consider  the  consequences  of  applying  the  Schur-Wielandt 
deflation  scheme  to  construct  a.  new  Arnoldi  factorization  using  Vme\  as  a  starting 
vector.  In  the  symmetric  case  with  exact  arithmetic,  the  two  schemes  would  be 
mathematically  equivalent.  Without  these  assumptions,  there  may  be  considerable 
differences.  From  equation  (6.5.3),  it  follows  that 

(6.5.4)  (A  -  UjRjUf)Vm  =  A(I  -  U3llJ)Vm  =  U3M3  +  VmHm  +  /„+;<&. 

From  equation  (6.5.4)  we  can  use  an  easy  induction  to  derive  the  relations 

(A-UjRjUfyVnex  =  (U3M3  +  VmHm)H'-'eu  i>  1. 

Thus,  the  Krylov  subspace  K-k(A  —  UjRjUj ,Vme\)  and  hence  the  corresponding 
Arnoldi  factorization  of  A  —  UjB.jUf  must  be  corrupted  with  components  in  71(11  j) 
when  the  starting  vector  is  orthogonal  to  7 Z(Uj).  Within  the  context  of  Arnoldi  itera¬ 
tions,  the  Schur-Wielandt  techniques  do  not  deflate  the  invariant  subspace  informa¬ 
tion  contained  in  the  7 Z(U3)  from  the  remainder  of  the  iteration. 

This  helps  to  explain  why  Saad  suggests  that  Wielandt  and  Schur-Wielandt  de¬ 
flation  techniques  should  not  be  used  “to  compute  more  than  a  few  eigenvalues  and 
eigenvectors.”*  We  note  that  if  M3  ^0,  then  the  Wielandt  forms  of  deflation  may 
safely  be  used  within  an  Arnoldi  iteration.  This  will  always  be  true  when  A  is  sym¬ 
metric. 

The  cost  of  matrix  vector  products  with  A3  increases  due  to  the  rank  j  mod¬ 
ifications  of  A  required.  Moreover,  every  time  an  approximate  Schur  vector  or  a 
Ritz  vector  converges,  the  iteration  needs  to  be  explicitly  restarted  with  A3.  The 


tPage  125  of  [78] 
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two  deflation  techniques  introduced  in  this  paper  allow  the  iteration  to  be  implicitly 
restarted — avoiding  the  need  to  build  a  new  factorization  from  scratch. 

Finally,  we  mention  that  the  idea  of  deflating  a  converged  Ritz  value  from  a 
Lanczos  iteration  is  also  discussed  by  Parlett  and  Nour-Omid  [64],  They  present  an 
explicit  deflation  technique  by  using  the  QR  algorithm  with  converged  Ritz  values  as 
shifts.  Parlett  indicates  that  this  was  a  primary  reason  for  undertaking  the  study 
concerning  the  forward  instability  of  the  QR  algorithm  [63]. 


6.6  Numerical  Results 

An  IRA-iteration  using  the  two  deflation  procedures  of  section  6.3.2  was  written  in 
MATLAB,  Version  4.2a.  An  informal  description  given  parameters  k  and  p  is  given  in 
Table  6.1.  The  codes  are  available  from  the  author  upon  request.  A  high-quality  and 
robust  implementation  of  the  deflation  procedures  is  planned  for  the  Fortran  software 
package  ARPACK  [49]. 

In  the  examples  that  follow  Qk  and  Rk  denote  the  approximate  Schur  factors 
for  an  invariant  subspace  of  order  k  computed  by  an  IRA-iteration.  All  the  exper¬ 
iments  used  the  starting  vector  equal  to  randn(n,  1)  where  the  seed  is  set  with 
randn('seed' ,  0)  and  n  is  the  order  of  the  matrix.  The  shifting  strategy  uses  the 
unwanted  eigenvalues  of  Hk+P  that  have  not  converged.  An  eigenpair  (9,  s)  of  Hk+P 
is  accepted  if  its  Ritz  estimate  (2.5.1)  satisfies, 

(6-6.1)  l^+p-AWfk+pW  <  e\0\. 

The  value  of  e  is  chosen  according  to  the  relative  accuracy  of  the  Ritz  value  desired. 


6.6.1  Example  1 


The  first  example  illustrates  the  use  of  the  deflation  techniques  when  the  underlying 
matrix  has  several  complex  repeated  eigenvalues.  The  example  also  demonstrates 
how  the  iteration  locks  and  purges  blocks  of  Ritz  values  in  real  arithmetic.  A  block 
diagonal  matrix  C  was  generated  having  n  blocks  of  order  two.  Each  block  was  of 
the  form 

6  m 
.  &  . 

where 


6= 


i+j- 1 


=  4sin2( 


VK 


2  («  +  1) 


)  +  4  sin2  ( 


jir 


2(n  +  1) 


), 
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1.  Initialize  an  Arnoldi  factorization  of  length  k 

2.  Main  Loop 

3.  Extend  an  Arnoldi  factorization  to  length  k+p 

4.  Check  for  convergence 

Exit  if  k  wanted  Ritz  values  converge 

Let  i  and  j  denote  the  wanted  and  unwanted  converged 

Ritz  values,  respectively 

5.  Lock  the  i  +  j  converged  Ritz  values 

6.  Implicit  application  of  shifts  resulting  in  an 
Arnoldi  factorization  of  length  k  -f- j 

7.  Purge  the  j  unwanted  converged  Ritz  values. 

Table  6.1  Formal  description  of  an  IRA-iteration 


for  1  <  i,j  <  n  and  r//  =  y/(i.  The  eigenvalues  of  C  are  £/  ±  rpi  where  i  =  \f—\.  Since 
the  eigenvalues  of  a  quasi-diagonal  matrix  are  invariant  under  orthogonal  similarity 
transformations,  using  an  IRA-iteration  on  C  with  a  randomly  generated  starting 
vector  is  general.  An  IRA-iteration  was  used  to  compute  the  k  =  12  eigenvalues 
of  C450  with  smallest  real  part.  The  number  of  shifts  used  was  p  =  16  and  the 
convergence  tolerance  e  was  set  equal  to  IO-10.  With  these  choices  of  k  and  p,  the 
iteration  stores  at  most  twenty  eight  Arnoldi  vectors. 

There  are  four  eigenvalues  with  multiplicity  two.  Table  G.2  shows  the  results 
attained.  Let  the  diagonal  matrix  D\2  denote  the  eigenvalues  of  the  upper  triangular 
matrix  R\ 2  computed  by  the  iteration.  The  diagonal  matrix  A12  contains  the  wanted 
eigenvalues.  After  twenty  four  iterations  twelve  Ritz  values  converged.  But  the  pair 
of  Ritz  values  purged  at  iteration  twenty  one  was  a  previously  locked  value  which  the 
iteration  discarded.  This  behavior  is  typical  when  there  are  clusters  of  eigenvalues. 

6.6.2  Example  2 

Consider  the  eigenvalue  problem  for  the  convection-diffusion  operator, 


Au(x,  y)  +  p(ux(x,  y)  +  uv(x,  y))  =  A u(x,  y), 
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Table  6.2  Convergence  history  for  Example  one 
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on  the  unit  square  [0, 1]  x  [0, 1]  with  zero  boundary  data.  Using  a.  standard  five- 
point  scheme  with  centered  finite  differences,  the  matrix  Ln i  that  arises  from  the 
discretization  is  of  order  n2  where  h  =  l/(n  +  1)  is  the  cell  size.  The  eigenvalues  of 
Ln 2  are 

A«j  =  2\A  -  7co«(— ^t)  +  2^1  -  7C,os(-^— ), 

v  n  +  1  v  n  +  1 

for  1  <  i,j  <  n  where  7  =  ph/ 2.  An  IRA-iteration  was  used  to  compute  the  k  —  6 
smallest  eigenvalues  ot  L62 5  where  p  —  25.  The  number  of  shifts  used  was  p  —  10  and 
the  convergence  tolerance  e  was  set  equal  to  10-s.  With  these  choices  of  k  and  p,  the 
iteration  stores  at  most  sixteen  Lanczos  vectors.  Let  the  diagonal  matrix  Dq  denote 
the  eigenvalues  of  the  upper  triangular  matrix  /?<;  computed  by  the  iteration.  The 
diagonal  matrix  Ac  G  R0x6  contains  the  six  smallest  eigenvalues.  We  note  that  there 
are  two  eigenvalues  with  multiplicity  two.  Table  6.3  shows  the  results  attained.  The 
diagonal  matrix  DG  approximates  A0.  After  thirty  iterations  six  Ritz  values  converged. 
But  the  Ritz  value  purged  at  iteration  twenty  four  was  a  previously  locked  value.  The 
other  purged  Ritz  values  are  approximations  to  the  eigenvalues  of  L(i25  larger  than 
Ag. 

Figure  6.3  gives  a  graphical  interpretation  of  the  expense  of  an  IRA-iteration  in 
terms  of  matrix  vector  products  when  the  value  of  p  is  increased.  For  all  values  of 
p  shown,  the  results  of  the  iteration  were  similar  to  those  of  Table  6.3.  The  results 
presented  in  Table  6.3  correspond  to  the  value  ol  p  that  gave  the  minimum  number 
matrix  vector  products.  For  the  value  of  p  —  1,  the  iteration  converged  to  the  five 
smallest  eigenvalues  after  nine  hundred  ninety  nine  matrix  vector  products.  But  the 
iteration  was  not  able  to  converge  to  the  second  copy  of  A5.  For  p  =  2,  the  only  form 
of  deflation  employed  was  locking.  All  others  values  of  p  shown  demonstrated  similar 
behavior  to  that  of  Table  6.3. 

In  order  to  determine  the  benefit  of  the  two  deflation  techniques,  experiments  were 
repeated  without  the  use  of  locking  or  purging.  In  addition,  all  the  unwanted  Ritz 
values  were  used  as  shifts,  converged  or  not.  The  first  run  used  the  same  parameters 
as  given  in  Table  6.3.  After  210  matrix  vector  products,  the  iteration  converged  to 
six  Ritz  values.  But  the  second  copy  of  the  fifth  smallest  eigenvalue  was  not  among 
the  final  six.  The  value  of  p  was  increased  to  twenty  three  with  the  same  results. 
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IRA-iteration  on  L^h 

k  —  6  and  p 

=  10  with  convergence  tolerance 

is  e  =  10"8 

Iteration 

Ritz  values  Locked  Ritz  values  Purged 

14 

1 

0 

16 

1 

0 

19 

1 

0 

21 

1 

0 

23 

1 

1 

24 

0 

1 

30 

1 

0 

35 

0 

1 

38 

1 

1 

Totals 

7 

4 

Number  of  matrix  vector  products 

325 

11^625^6  ~  QgRg  |  ~  10  J 

\\Qf(i  LvisQs  —  /?e||  ~  10-J 

\\QlQ«-h  ~  io-14 

—  A(j  oo  ~  10  7 

Table  6.3  Convergence  history  for  Example  two 
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Figure  6.3  Bar  graph  of  the  number  of  matrix  vector  products  used 
by  an  iRA-iteration  for  Example  2  as  a  function  of  p. 


6.6.3  Example  3 


The  following  example  shows  the  behavior  of  the  iteration  on  a  matrix  with  a  very  ill 
conditioned  basis  of  eigenvectors.  Define  the  Clement  tridiagona.1  matrix  [41]  of  order 
n  +  1 


Bn+l 


0  n  •  •  •  0 

1  0  n  -  1 

0  n  0 


The  eigenvalues  are  ±«,  ±n  —  2,  •  •  • ,  ±1  and  zero  if  n  is  even.  We  note  that  Bn+\  = 
Sn+iAn+iS^  where  S‘*+1  —  diag(l,  j,  •  •  • ,  ^|)  is  a  diagonal  matrix.  Thus  the 

condition  number  of  the  basis  of  eigenvectors  for  Bn+ j  is  ||  Sn+i  ||  ||5'“|i  ||  which  implies 
that  the  eigenvalue  problem  for  Bn+ 1  is  cjuite  ill  conditioned.  An  IRA-iteration  was 
used  to  compute  the  k  =  4  largest  in  magnitude  eigenvalues  of  i?iooo-  The  number 
of  shifts  used  was  p  =  16  and  the  convergence  tolerance  t  was  set  ecpial  to  10-6. 
With  these  choices  of  k  and  p,  the  iteration  stores  at  most  twenty  Arnoldi  vectors. 
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Let  the  diagonal  matrix  D4  denote  the  eigenvalues  of  the  upper  triangular  matrix  R4 
computed  by  the  iteration.  The  diagonal  matrix  A4  G  R4x4  contains  the  four  largest 
in  magnitude  eigenvalues.  Table  6.4  shows  the  results  attained. 

Although  the  iteration  needed  a  large  number  of  matrix  vector  products,  the 
iteration  was  able  to  extract  accurate  Ritz  values  given  the  convergence  tolerance. 

6.6.4  Example  4 

Finally,  we  present  a  dramatic  example  of  how  the  convergence  of  an  IRA-iteration 
benefits  from  the  two  deflation  procedures.  A  matrix  T  of  order  ten  had  the  values 

Vi  =  10_(l, r;,=2:8  =  i  •  10-3,  v9;io  =  1, 

on  the  diagonal.  Since  the  eigenvalues  of  a  matrix  are  invariant  under  orthogonal 
similarity  transformations,  using  an  IRA-iteration  on  T  with  a  randomly  generated 
starting  vector  is  general.  An  IRA-iteration  was  used  to  compute  an  approximation 
to  the  smallest  eigenvalue.  The  number  of  shifts  used  was  p  =  3  and  the  convergence 
tolerance  e  was  set  equal  to  10~3.  Table  6.5  shows  the  results  attained. 

Another  experiment  was  run  with  the  locking  and  purging  mechanisms  turned  off. 
Additionally,  all  unwanted  Ritz  values  were  used  as  shifts.  The  same  parameters  were 
used  as  in  Table  6.5  but  the  iteration  now  consumed  forty  one  matrix  vector  products. 
As  in  the  results  lor  Table  6.5,  the  modified  iteration  converged  to  one  of  the  dominant 
eigenvalues  after  one  iteration.  After  six  iterations,  the  leading  block  of  H4  split  off, 
having  converged  to  the  invariant  subspace  corresponding  to  vjy:10.  But  since  purging 
was  turned  off,  the  modified  iteration  had  to  continue  attempting  to  converge  to  tq 
using  only  the  lower  block  of  order  two  in  H4.  Inc.idently,  if  the  iteration  instead 
simply  discarded  the  leading  portion  ol  the  factorization  corresponding  to  after 
the  sixth  iteration,  convergence  to  rq  never  occurred.  Crucial  to  the  success  of  an 
IRA-iteration  is  the  ability  to  deflate  converged  Ritz  values  in  a  stable  manner.  Both 
purging  and  locking  allow  faster  convergence. 
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IR  A-iteration  on 

61000 

k  =  4  and 

p  ~  16  with  convergence  tolerance  is  c  =  10  0 

Iteration 

Ritz  values  Locked 

Ritz  values  Purged 

76 

1 

0 

85 

1 

0 

91 

2 

0 

Totals 

4 

0 

Number  of  matrix  vector  products 

1423 

|-Si000^4  —  Q4-R4II/II-51000 

» io~e 

\\Q{  B1000Q4  —  64 1|  ~  10  (> 

WQ'l Qa  ~  h  ~ 

10" 

14 

||64  ~  A4  1 00/ 1 1 -610OO 

00  «  10"6 

Table  6.4  Convergence  history  for  Example  three 


IRA-iteration  on  T 


k  =  1  and  p  =  3  with  convergence  tolerance  is  e  =  10  3 


Iteration 

Ritz  values  Locked 

Ritz  values  Purged 

1 

0 

1 

15 

1 

1 

Totals 

1 

2 

Number  of  matrix  vector  products 

32 

\\TQ\  —  Q\Bi\\/vi  ~  10  3 
WQ'jTQx  -  i?i H/t^i  «  10~3 
WQiQi-iiWmo-™ 
ll-^i  -  QilU/^i  ~  io~3 


Table  6.5  Convergence  history  for  Example  four 


104 


Chapter  7 


Maintaining  Orthogonality  during  an 
IRA-iteration 


Probably  the  single  most  important  factor  governing  the  robust  implementation  of 
an  IRA-iteration  is  that  of  computing  an  orthogonal  set  of  Arnoldi  vectors  defined 
by  the  columns  of  14  in  Algorithm  2.2  of  Chapter  2.  If  Algorithm  2.2  of  Chapter  2 
is  used  to  compute  an  Arnoldi  factorization,  a  point  is  typically  reached  where  the 
columns  of  the  Arnoldi  matrix  constructed  will  no  longer  be  orthogonal  to  the  resid¬ 
ual  vector.  Thus,  we  require  a  computational  procedure  that  monitors  the  possible 
loss  of  orthogonality  in  an  inexpensive  manner.  Additionally,  an  efficient  and  stable 
computational  procedure  is  needed  to  enforce  orthogonality  when  needed. 

The  Arnoldi/Lanczos  factorizations  fell  from  favor  among  numerical  analysts  due 
to  the  observed  loss  of  orthogonality  soon  after  their  discovery.  The  work  of  Paige  [57] 
revived  the  Lanczos  factorization  since  it  explained  the  significance  of  the  loss  of  or¬ 
thogonality  that  occurred  in  actual  computation.  This  chapter  introduces  the  ad¬ 
ditional  difficulties  associated  with  nonsymmetric  A  and  reviews  the  ways  in  which 
orthogonality  may  be  enforced.  We  first  explain  the  loss  of  orthogonality  of  an  Arnoldi 
factorization  in  §  7.1.  The  significance  of  the  loss  of  orthogonality  during  the  Lanczos 
iteration  is  discussed  in  §  7.2.  The  different  approaches  used  to  ensure  orthogonality 
are  surveyed  in  §  7.3. 

7.1  Orthogonalization  and  the  Arnoldi  Factorization 

Computing  the  Arnoldi  factorization  in  finite  precision  gives 
(7-1.1)  A\ 4  =  VkHk  +  fkeTk  +  Rk, 

where  Rk  £  RnXfc  accounts  for  the  roundoff  error  and  hatted  quantities  are  computed 
analogues  of  those  in  Algorithm  2.2.  The  residual  fk  is  the  computed  projection  of 
AVkek  =  At 4  onto  the  TZ(Vk):  jk  —  (/  —  VkVk T)Avk.  Figure  7.1  shows  this  geometric 
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relationship.  From  Algorithm  2.2  of  Chapter  2,  the  residual  fk  associated  with  the 
length  k  Arnoldi  factorization  becomes  the  (A;  +  l)-th  Arnoldi  vector. 

A  forward  error  analysis  shows  that  |ji?fc||  =  0(e'M)||A||  where  (m  designates  the 
machine  precision.  Although  ecpxation  (7.1.1)  is  an  exact  relationship  it  does  not 
follow  that  ||t4T/fc||  =  ||A||efc  where  tk  cm-  A  robust  implementation  computing 
an  Arnoldi  factorization  has  Vf  14  =  Ik  -f  Ek  where  |jj?fc||  «  cm-  Thus,  the  loss  of 
orthogonality  may  be  studied  by  analyzing  the  construction  of  and  the  resulting 
vector  Vf  fa.  Numerical  difficulties  may  be  expected  when  j\  is  nearly  in  the  lZ(Vk) 
or  equivalently,  the  angle  <f>  in  Figure  7.1  is  small. 

7.2  Loss  of  Orthogonality 

As  mentioned  after  Algorithm  2.2  of  Chapter  2,  a  three  term  recurrence  may  be  used 
to  compute  the  residual  vector  fk  when  A  is  symmetric.  Unfortunately,  computing 
in  floating  point  arithmetic  removes  the  possibility  of  an  exact  three  term  recurrence: 
Since  the  columns  of  t4  are  only  approximately  orthogonal,  the  computed  fk  depends 
on  all  the  columns  of  Vk-  The  work  of  Paige  [57]  was  the  first  to  analyze  the  effects 
of  floating  point  arithmetic  upon  the  Lanczos  factorization.  Bai  [5]  recently  ana¬ 
lyzed  the  nonsymmetric  Lanczos  procedure.  Both  Paige  and  Bai  demonstrate  that  a 
loss  of  orthogonality  is  accompanied  by  a  group  of  Ritz  pairs  emerging  as  excellent 
approximations  to  eigenpairs  of  A.  The  Arnoldi  factorization  lacks  a  similar  result. 

If  orthogonality  is  not  enforced,  as  the  Lanczos  factorization  is  extended,  further 
copies  of  the  “converged”  Ritz  values  emerge.  Determining  whether  these  spurious 
copies  are  not  actual  eigenvalues  of  A  of  multiplicity  greater  than  one  is  not  an  easy 
task.  Cullum  and  Willoughby  [21]  present  heuristics  that  attempt  to  distinguish  the 
spurious  Ritz  values  from  the  actual  multiple  ones. 

For  symmetric  A,  Simon  [81]  presents  a  comprehensive  study  of  the  impact  orthog- 
onalization  methods  have  on  the  Lanczos  iteration  using  the  three  term  recurrence. 
This  includes  the  work  of  Parlett  and  Scott  [66]  on  selective  orthogonalization.  The 
analysis  presented  by  Paige  shows  that  the  computed  residual  vector  fk  losses  the 
most  orthogonality  in  the  direction  of  the  Ritz  vectors  associated  with  the  Ritz  val¬ 
ues  that  are  nearly  eigenvalues  of  A.  Selective  orthogonalization  is  a  strategy  that 
seeks  to  correct  the  loss  of  orthogonality  in  only  these  “converged”  directions.  We 
remark  that  the  locking  of  Ritz  pairs  with  small  Ritz  estimates  presented  in  Chapter  6 
is  also  a  selective  orthogonalization  method. 
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f  =  Av  -  Vh 


Av 


Figure  7.1  Projecting  Avk  =  Av  onto  the  column  space 
of  Vk  =  V  and  its  orthogonal  compliment. 


7.3  Practical  Implementations 

The  problem  of  computing  an  orthogonal  residual  vector  is  equivalent  to  updating 
the  approximate  QR  factorization  of 


(7.3.1) 


Vi  Ai„ 


h  hk+\ 

0  4+i 


where  we  use  the  notation  of  Algorithm  2.2  of  Chapter  2.  The  factorization  exists 
as  long  as  4+i  is  not  equal  to  zero.  From  Figure  7.1  we  see  that  4+i  =  ||A0*;||  sin  <j> 
implying  that  for  small  (f>  the  computed  residual  f\  has  probably  suffered  cancelation. 
We  emphasize  that  this  cancelation  is  responsible  for  the  loss  of  orthogonality  between 
Vk  and  fk  even  though  || J\  —  All  =  0(tM)\\Avk\\.  Theorems  1  and  2  in  Hoffmann  [42] 
establish  these  important  relationships. 

There  are  several  ways  to  compute  a  residual  f  that  is  numerically  orthogonal 
to  the  columns  of  Vk-  Hoffmann  [42]  analyzes  in  detail  iterative  algorithms  for 
computing  the  QR  factorization  of  a.  matrix  using  Gram-Schmidt  methods.  In  the 
special  case  where  Vk  is  a  single  vector,  an  unpublished  result  of  Kalian  found  in 
Parlett  [61,  pages  105-109],  shows  that  orthogonality  to  working  precision  is  accom¬ 
plished  with  at  most  one  step  of  re-orthogonalization.  Their  decision  to  perform  a 
re-orthogonalization  is  based  on  whether  the  cosine  of  the  angle  between  the  computed 
projection  f\  and  Ab\  is  less  than  some  prescribed  tolerance.  This  leads  Saad  [78, 


page  177]  along  with  Reichel  and  Gragg  [67,  page  372]  to  conjecture  that  at  most  one 
step  of  re-orthogonalization  suffices  for  the  more  general  result  of  orthogonalizing  one 
vector  against  a  group  of  others.  Although  widely  believed  to  be  true,  there  exists 
no  proof  for  how  many  re-orthogonalizations  are  required  for  the  more  general  case 
of  orthogonalizing  a  vector  against  a  group  of  others.  For  example,  Bjorck  [14]  states 
that  Hoffmann’s  analysis  proves  that  at  most  one  re-orthogonalization  suffices  for 
the  more  general  case  but  no  proof  is  offered.  Hoffmann’s  extensive  experimentation 
never  revealed  the  need  for  a  second  orthogonalization  but  that  this  would  always  be 
true  was  never  rigorously  justified. 

The  decision  to  perform  another  step  of  orthogonalization  for  the  more  general 
case  required  by  Algorithm  2.2  is  essentially  the  same  as  for  the  two  vector  case. 
If  the  ratio  ||Ar4||/||/fc||  =  sinf  is  less  than  a  prescribed  tolerance  rj  then  a  re- 
orthogonalization  of  fk  against  all  the  columns  of  14  is  performed.  Performing  the 
first  re-orthogonalization  step  for  the  Arnoldi  factorization  in  equation  (7.1.1)  results 
in 

(7.3.2)  AVk  =  Vk {Hk  +  (jktl)  +  (fk  -  Vyjk )e*  +  Rk, 

where  gk  =  V? fk-  The  goal  is  to  force  \\V?  (fk  ~  Vkgk)\\  =  0(eM)\\fk  -  Vk<Jk\\-  We 
remark  that  the  eigenvalues  of  Hk  +  now  approximate  those  of  A.  The  next 
section  considers  determining  whether  this  process  needs  to  be  repeated. 

7.3.1  DGKS  Analysis  and  Method 

Daniel,  Gragg,  Kaufman,  and  Stewart  [22]  present  a  numerically  stable  algorithm  for 
updating  the  factorization  of  equation  (7.3.1).  Their  formulation  is  summarized  by 
Algorithm  7.1.  For  clarity,  the  subscripts  are  dropped  and  the  algorithm  is  used  at 
every  step  j  of  Algorithm  2.2  to  compute  a  numerically  orthogonal  residual  vector  Jj. 

Algorithm  7.1 

1.1  hi-  0  ; 

1.2  /  «-  Av  ; 

1.3  Begin  loop  ; 
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2.3  h  <—  h  +  g  ; 

2.4  /  <-  w  -  V</  ; 

1.4  Repeat  loop  until  ||/||  >  rj\\w\\  ; 

The  loop  in  Algorithm  7.1  is  entered  a  second  time  it  the  sine  ol  the  angle  between 
/  and  Av  is  less  than  or  equal  to  //.  The  parameter  t]  is  chosen  to  satisfy  0  <  rj  <  1. 
Larger  values  of  rj  will  result  in  more  work  while  smaller  values  result  in  a  relaxing  of 
the  orthogonality  between  V  and  the  final  /.  Further  iterations  ol  the  loop  are  only 
required  if  the  cosine  of  the  angle  between  successive  approximate  residual  vectors  is 
less  than  or  equal  to  t].  Intuitively,  after  the  second  pass  through  the  loop,  termination 
depends  upon  successive  approximate  residual  vectors  being  nearly  aligned.  Analysis 
by  Daniel  et  al.  [22]  shows  that  Algorithm  7.1  eventually  terminates  given  some  mild 
assumptions  on  the  model  of  floating  point  arithmetic  used. 


7.3.2  Classical  and  Modified  Gram-Schmidt  Orthogonalization 


Algorithm  7.1  is  an  implementation  of  iterative  classical  Gram-Schmidt  (CGS)  or¬ 
thogonalization.  It  is  well  known  that  CGS  orthogonalization  is  not  a  stable  algo¬ 
rithm  for  computing  the  QR  factorization  of  a  matrix.  On  the  other  hand,  a  simple 
rearrangement  of  the  CGS  process,  the  modified  Gram-Schmidt  algorithm  (mgs)  is 
conditionally  stable.  If  we  denote  the  jf-th  column  of  Vk  by  Vj,  the  MGS  and  CGS 
algorithms  for  computing  jk  are  mathematically  equivalent  to 

(7.3.3)  fk  *-  ( In  ~  vkvk)  ■  ■  ■  (In  -  )Avk, 

(7.3.4)  fk  <-  (In-VkVkT)Avk, 

respectively.  In  exact  arithmetic,  both  variants  are  the  same.  However,  as  Bjorck  [13] 
showed,  both  may  compute  drastically  different  residual  vectors  in  floating  point 
arithmetic. 

Using  MGS  orthogonalization  to  compute  the  QR  factorization  of  equation  (7.3.1) 
results  in 


\\Qk+iQk+i  ~  h+ 1|| 


where  Qk+1  =  Vk+i  fh+ifk  and  Bk 


K(R.k+1)tM , 

and  the  condition  number  of 


h  hk+i 
0 

Vk  Avk  ]  is  approximated  by  K,(Rk+i)  =  || || ||/4+i ||-  Thus,  a  small  fik+i  gives 
a  large  condition  number  and  hence  MGS  may  not  be  stable. 
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However,  Hoffmann’s  [42]  analysis  shows  that  the  iterative  versions  of  CGS  and 
MGS  orthogonalization,  i.e.  performing  re-orthogonalizations  to  ensure  orthogonality, 
are  stable.  From  equation  (7.3.4),  the  main  computation  of  CGS  orthogonalization 
involves  the  matrix-vector  products  hk+i  =  V* Tiik  and  ibk  —  \\lu  +  i  where  WkAvk ■ 
Hence,  iterative  CGS  orthogonalization  is  better  suited  for  vector  and  parallel  comput¬ 
ing  because  of  the  matrix  vector  products.  Instead,  iterative  MGS  orthogonalization 
involves  a  recurrence  of  vector-vector  operations  7 j  =  v1-  Wk  and  —  Vj'jj  to  compute 
the  residual. 

7.3.3  Using  Householder  Transformations 

Another  alternative  that  must  be  mentioned  is  that  of  employing  Householder  trans¬ 
formations  as  introduced  by  Walker  [95].  Walker  presents  an  algorithm  for  computing 
a  sequence  of  Householder  matrices  Pi, . . . ,  Pk  so  that  a  length  k  Arnoldi  factorization 
is  constructed  for  I\:  ■  ■  ■  P\AF\  ■  ■  ■  Pk-  Saad  [78,  page  177]  compares  the  cost  of  the 
two  Gram-Schmidt  variants  with  Walker’s  Householder  approach.  For  modest  values 
of  k,  the  Householder  approach  requires  about  twice  as  many  floating  point  opera¬ 
tions  as  the  iterative  Gram-Schmidt  one  if  no  re-orthogonalizations  are  required — an 
unlikely  occurrence.  The  Householder  and  the  iterative  Gram-Schmidt  orthogonal- 
izat.ions  methods  for  computing  a  length  k  Arnoldi  factorization  are  roughly  the  same 
when  every  column  of  V*  requires  a  re-orthogonalization.  In  addition,  Walker  consid¬ 
ers  the  efficient  implementation  of  the  Householder  approach  on  a  parallel  machine. 
Further  study  is  needed  to  determine  the  comparative  numerical  behavior  as  well  as 
the  efficiency  of  the  competing  orthogonalization  algorithms. 

7.3.4  ARPACK  Software 

The  ARPACK  [49]  software  currently  uses  CGS  with  possible  re-orthogonalization  at 
each  step.  This  remains  feasible  within  an  IRA-iteration  since  storage  requirements 
for  the  Arnoldi  basis  vectors  may  be  fixed  in  advance  of  the  iteration.  The  imple¬ 
mentation  is  efficient  since  the  level  2  BLAS  [26]  are  employed.  The  matrix-vector 
multiplications  often  allow  the  underlying  architecture  of  the  computer  to  be  more 
efficiently  utilized.  Parallel  and  vector  computers  exemplify  this  behavior. 

The  actual  choice  for  the  parameter  i)  is  as  follows.  When  A  is  symmetric,  the  value 
of  T]  =  .5  —  sin7r/6  results  in  a  good  compromise  between  maintaining  an  orthogonal 
set  of  Lanczos  vectors  without  an  unnecessary  amount  of  re-orthogonalizations.  For 


no 


nonsymmetric  A,  the  value  of  //  =  l/y/2  =  sin7r/2  achieved  the  same  goal.  Work  is 
underway  to  better  understand  the  selection  of  rj  and  its  impact  upon  the  numerical 
orthogonality  of  the  Arnoldi  vectors. 


Ill 


Chapter  8 


Some  Practical  Aspects  for  the  Convergence  of 

an  IRA-iteration 


The  determination  of  the  parameters  k  and  p  needed  during  an  IRA-iteration  requires 
further  analysis  as  mentioned  at  the  end  of  §  4.2  of  Chapter- 4.  The  value  of  k  is 
typically  the  number  of  eigenvalues  of  A  requiring  approximation.  At  present,  there 
is  no  a-priori  analysis  to  guide  the  selection  of  p  relative  to  k.  Increasing  p  relative 
to  k  usually  decreases  the  required  number  of  matrix  vector  products  with  A  needed 
by  Algorithm  4.2  but  it  also  increases  the  work  and  storage  required  to  maintain  the 
orthogonal  Arnoldi  basis  vectors.  The  optimal  cross-over  value  of  p  depends  upon 
A’s  spectral  properties  and  the  underlying  computer  system. 

One  of  the  goals  of  this  chapter  is  to  present  some  heuristics  anti  formal  anal¬ 
ysis  that  help  in  selecting  of  p  relative  to  k.  A  connection  was  made  between  an 
implicitly  shifted  QR-iteration  and  the  IRA-iteration  in  Chapters  3  and  4.  There  is 
also  a  well  known  connection  between  simultaneous,  or  subspace,  iteration  and  the 
QR-iteration.  Subspace  iteration  is  an  extension  of  the  simple  power  method  applied 
to  a  starting  matrix  consisting  of  linearly  independent  vectors.  When  the  columns  of 
the  starting  matrix  are  ortlionormal,  subspace  iteration  is  also  referred  to  as  orthog¬ 
onal  iteration.  Thus,  we  may  then  make  use  of  the  practical  knowledge  known  about 
orthogonal/subspace  iteration  methods. 

Simple  orthogonal  iteration  is  introduced  in  §  8.1.  A  more  elaborate  version, 
shifted  orthogonal  iteration  is  the  subject  of  §  8.2.  Comparing  orthogonal  iteration 
and  an  IRA-iteration  is  considered  in  §  8.3  including  an  adaptive  procedure  for  pre¬ 
venting  stagnation  and  accelerating  the  convergence  of  the  iteration.  An  implicitly 
shifted  orthogonal  iteration  algorithm,  analogous  to  the  IRA-iteration,  is  introduced 
in  the  final  section. 
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8.1  Orthogonal  Iteration 

Suppose  that  A Q  =  QR  is  a  real  Sc.hur  decomposition  where  we  partition  Q  — 
Qk  Qn-k  and  the  eigenvalues  are  ordered  in  descending  order  of  magnitude  along 
the  quasi-diagonal  of  R.  If  | A* |  >  |A*+i|  then  Dk(A)  —  7 Z(QQ  is  said  to  be  the 
dominant  invariant  subspace  of  dimension  k  for  A. 

Simple  orthogonal  iteration  is  defined  by  the  following  procedure: 

Algorithm  8.1  (Simple  Orthogonal  Iteration) 

1.1  Initialize:  <—  ex  e2  •  •  •  e*  ; 

1.2  For  j  =  1,2,...  ; 

2.1  W(kj)  =  AU(kj)  ; 

2.2  Compute  the  QR  factorization  Uk+1^R^+1^  =  ; 

1.3  End  j . 

Golub  and  Van  Loan  [35,  page  354]  show  that  if 

(8.1.1)  Vk{AT)L  0  Span{ej}j=1  =  {()), 

then  —>  T>k{A)  as  j  — >  oo  and  rate  of  convergence  is  proportional  to  |A^.+1/Ai|. 

Thus,  (U[^)T  AU^  =  is  converging  to  R,k  —  Q{AQk ■  The  geometrical  interpre¬ 
tation  of  the  subspace  condition  in  equation  (8.1.1)  is  that  a  vector  in  Span{eJ}j_1 
must  have  a  nonzero  component  in  the  direction  of  some  vector  in  T>k(A).  Since 
AQ  =  QR  implies  that  A1  Q  =  QRT ,  we  may  equate  the  last  n  —  k  columns  to  obtain 
that  Vk(AT)±  =  7 Z(Qn-k). 

As  Golub  and  Van  Loan  [35,  page  355]  also  observe,  the  QR-iteration  is  orthogonal 
iteration  in  disguise.  Consider  the  identities 

(8.1.2)  T&  =  (UP)tAU&  =  (UP)TWP  =  (U^)tU^rH+1\ 

and 


(8.1.3) 


710'+1) 

» 


(!,0+->)TA!/U+,t 
(r/<)+1>)T/i  ;/«>(£/, </))Tt/«+,) 
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The  identity  in  equation  (8.1.2)  computes  the  QR  factorization  of  R.[p  while  the 
second  in  equation  (8.1.3)  multiplies  the  factors  in  reverse  order  to  get  R^+1^ — 
successive  orthogonal  iterations  define  a  QR  step  with  shift,  zero  !  This  is  also  im¬ 
plied  by  Theorem  3.2.  We  remark  that  if  A  is  first  reduced  to  upper  Hessenberg 
form  H ,  and  Algorithm  8.1  is  used  with  A  replaced  with  H ,  then  and 

(?/fd)T[/0’+1)  —  QU)  where  zero  shifts  are  used. 

8.2  Shifted  Orthogonal  Iteration 

The  following  extension  of  orthogonal  iteration  allows  a  set  ol  shifts  to  be  applied. 
This  allows  the  possibility  of  converging  to  another  invariant  subspace  of  A  besides 
the  dominant  one.  We  present  the  algorithm  first  and  then  discuss  its  many  features 
at  some  length. 

Algorithm  8.2  (Shifted  Orthogonal  Iteration) 
function  [C4,  2\]  =  orthit(A,  k,p) 

Output:  AUk  —  UkTk  =  Fk  where  the  residual  matrix  Fk  is  small  in  norm, 

UjUk  —  hi  and  U^Fk  —  0  and  is  upper  quasi-triangular. 

1.1  Initialize:  Ujp+j,  <—  e2  •••  ek+p  i 

1.2  For  j  =  1,2,... 

2.1  Wj%,  *-  where  pOJ(X)  =  (A  -  r<j))  •  •  •  (A  -  r,W>)  ; 

2.2  Compute  the  QR  factorization  :  QklpR^+p  =  Wfc+P  j 

2.3  e-  AQ[’lr  ;  e-  ; 

2.4  Compute  the  real  Schur  decomposition  h 

with  the  k  wanted  eigenvalues  in  the  leading  principal  matrix 
Ti\  of  order  k,  ill  T$f. 

2.5  C<i+;>  e-  ; 

2.6  Determine  convergence  ;  Deflate  converged  Ritz  vectors  ;  Modify 
p  if  desired  ; 

1.3  End  j. 

We  now  consider  many  of  the  details  that  will  lead  to  a  robust  implementation 
of  Algorithm  8.2.  As  we  shall  see,  many  of  these  details  will  carry  over  to  a  robust 
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implementation  of  an  IRA-iteration.  In  particular,  we  consider  the  two  codes,  EA12  by 
Duff  and  Scott  [28]  and  SRRIT,  by  Bai  and  Stewart  [10],  as  model  implementations. 
We  remark  that  SRRIT  is  only  set  up  to  compute  /Ids  dominant  invariant  subspace 
and  EA12  also  computes  this  space  as  well  the  invariant  subspace  corresponding  to 
/t’s  right-most  or  left-most  eigenvalues. 

Line  2.1  applies  the  m  shifts  In  order  to  avoid  the  use  of  complex  arithmetic, 
if  any  shift  has  a  nonzero  imaginary  part,  its  complex  conjugate  is  also  a  shift  during 
the  same  cycle  of  iteration.  As  with  the  IRA-iteration,  there  are  many  choices.  One 
could  use  an  exact  shift  strategy,  applying  the  unwanted  m  =  p  eigenvalues  of 
as  shifts  during  the  j-th  iteration.  This  leaves  an  arbitrary  choice  of  during  the  first 
iteration.  As  discussed  in  [28],  the  matrix  polynomial  'Pffl(A)  is  not  formed.  Rather, 
the  columns  of  V^).{A)U^lp  are  formed  using  the  recurrence  wff+p  *—  (A  —  t\^I)U\.^p 
for  i  —  1  ,...,m.  If  a  Chebyshev  polynomial  is  used,  the  three  term  recurrence 
should  be  employed  [76].  We  note  that  if  all  the  shifts  applied  are  zero,  then  simple 
orthogonal  iteration  is  recovered. 

The  degree  m  of  the  polynomial  applied  should  not  be  chosen  too  large  for  oth¬ 
erwise  the  columns  of  W^r  will  become  linearly  dependent.  However,  a  small  value 
of  m  leads  to  unnecessary  orthogonalizations.  The  important  property  is  that  the 
columns  of  W^p  of  lino  2.1  remain  numerically  linearly  independent.  Guidelines  are 
provided  in  [10,  28]  for  the  software  determining  the  degree  in  an  adaptive  fashion. 

Line  2.5  uses  a  Schur-Rayleigh-Ritz ,  SRR,  step  to  ensure  that  14 e;  converges 
to  the  si-til  Schur  vector  corresponding  to  some  ordering  of  the  eigenvalues  of  A. 
Originally  introduced  by  Stewart  [88]  within  the  context  of  simultaneous  iteration, 
P«(A)  =  \m’  ,  performing  a  SRR  step  gives  that  converges  to  the  Schur  vector 

associated  with  the  i-tli  largest  in  magnitude  eigenvalue  of  A.  Each  column  of  Ujf+ 
converges  at  the  rate  of  |At+p+i/A,]  where  A’s  eigenvalues  are  ordered  in  descending 
order  of  magnitude.  Thus,  the  initial  columns  of  converge  faster  than  the  latter 
ones  and  increasing  the  value  of  p  allows  a  faster  rate.  We  remark  that  a  SR  R  step  does 
not  actually  accelerate  convergence:  The  effect  is  to  unscramble  the  approximations 
to  Schur  vectors  already  present  in  the  column  space  of  Ujf+p-  Stewart  [88]  made  this 
observation  and  both  Chatelin  [18,  page  253]  and  Saad  [75,  page  132]  give  elegant 
but  elementary  proofs. 

Both  EA12  and  SRRIT  compute  the  /-the  column  of  the  residual  matrix  AU[^^  — 
^i+p^^i+p,  where  the  first  /  —  1  columns  have  already  converged.  Bai  and  Stewart 
further  discuss  the  convergence  of  SRRIT  to  the  invariant  subspace  corresponding  to 


115 


nearly  equimodular  eigenvalues.  As  the  columns  of  U^p  converge,  deflation  tech¬ 
niques  such  as  locking  should  be  employed. 


8.2.1  Convergence  of  Shifted  Orthogonal  Iteration 

Algorithm  8.2  requires  a  non-negative  value  of  p.  The  last  p  columns  of  Ujf^p  are  called 
guard  vectors.  When  V${ A)  =  A'm' ,  increasing  the  number  of  guard  vectors  acceler¬ 
ates  the  convergence  of  Algorithm  8.2  to  the  wanted  invariant  subspace.  However,  the 
number  of  matrix  vector  products  with  A  also  increases  as  well  the  work  necessary 
to  maintain  the  orthogonality  of  U^+p ■  As  with  an  IRA-iteration,  the  decision  on  how 
to  choose  p  depends  upon  many  factors. 

Watkins  and  Eisner  [100,  pages  29-35]  provide  convergence  results  for  a  non¬ 
stationary,  i.e.  shifted,  subspace  iteration  which  gives  an  indication  of  how  p  might 
be  selected.  We  present  one  of  their  results,  which  is  seen  to  be  a  generalization  of 
the  Golub  and  Van  Loan  [35]  one  for  the  non-stationary  case,  referenced  in  §  8.1. 


Theorem  8.3  Let  A  €  RnXn.  Suppose  that  the  eigenvalues  of  A  are  all 
of  algebraic  multiplicity  one  and  denote  by  Ai,  A2, . . . ,  An  some  ordering  of 
A’s  eigenvalues.  Let  a  real  Sehur  decomposition  AQ  =  QR  be  given  where 


Q  Qk+p  Qn—k—f. 


and  Ql+pAQk+p 


A  together:  A,  =  Aj 


ci  e2 


Rk+p  contains  the  eigenvalues 
Ai, . . . ,  Xk+p  where  complex  conjugate  pairs  are  ke 
implies  that  i,j  <  k  +  p.  Define  the  matrix  Uj^  = 

Let  Vmj{ A)  =  '(/’*,]* (A)  •  •  •  0[;t7](A)  be  the  product  of  a  sequence  polynomials 
for  some  positive  integer  J  such  that  Vmj{^i)  ^  0  for  i  =  1, . . . ,  k  +  p  and 
Mj  =  mi"  • rrij  Also  assume  that  if  any  root  of  0,Jd(A)  has  a  nonzero 
imaginary  part,  its  complex  conjugate  is  also  a  root. 

If  K('4+,J  n  K((5„_i_p)  =  {0}  and 

|Pm,(A0I 

(8.2.1) 


max 

k+p+l<i<n 


mi n  l^v/,(Ai)| 

l<l<k.+p 


0, 


as  J  — »  oo  then  the  non-stationary  iteration  defined  by  Vmj ( A)U^p  con¬ 
verges  in  the  sense  that  ^(E/jt+i)  “ >  Tt(Qk+P)- 


Proof  See  Theorem  5.1  in  [100,  page  29]. 


□ 
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The  geometrical  interpretation  of  the  subspace  condition 

=  {0} 


of  the  theorem  is  similar  to  that  given  in  §  8.1  for  simple  orthogonal  iteration.  Some 
vector  in  Span{ej}jjt[  must  have  a  nonzero  component  in  the  direction  of  some  vector 

m  H(Qk+P)- 

If  'Pmj(A)  -  Am‘  an(I  !Ail  >  | A2 (  >  •••  >  |A„|  where  |Afc|  >  |A*+i|  then  tradi¬ 
tional  subspace  iteration  is  recovered.  The  ratio  in  equation  (8.2.1)  gives  the  rate  of 
convergence  |Afc+p+i/Afc+p|  for  Ujf+j,  approaching  A’s  dominant  invariant  subspace. 

For  more  general  shifting  strategies,  the  ratio  in  equation  (8.2.1)  gives  the  global 
rate  of  convergence  of  Algorithm  8.2  to  an  invariant  subspace.  Using  SRR  steps,  we 
formally  extend  Stewart’s  result  to  f4+pej  converging  at  the  rate  of 


(8.2.2) 


PWA.0I 

k+P’r 

min  \VMj(\t)\ 

1  <l<l 


This  convergence  rate  may  be  significantly  better  than  the  one  given  by  |Afc+p+i/A;| 
when  the  interest  is  in  the  Schur  vectors  associated  with  A’s  dominant  invariant 
subspace.  Since  the  convergence  rate  is  a  complicated  function  involving  the  shifts 
applied,  it  is  not  an  obvious  decision  on  how  to  select  the  optimal  value  of  p  that 
leads  to  the  optimal  convergence.  Further  numerical  experimentation  is  needed. 

Again,  as  noted  in  §  4.2  of  Chapter  4  with  an  IRA-iteration,  the  success  of 
Algorithm  8.2  depends  upon  the  quality  of  the  shifting  strategy. 


8.3  Comparing  Orthogonal  and  an  IRA-iteration 

It  is  instructive  to  compare  an  IRA-iteration  with  that  of  Algorithm  8.2.  For  each  of 
Algorithms  8.2  and  4.2  (an  IRA-iteration)  of  Chapter  4,  we  have 


AQi’l 


T/0+1)  rrO+1)  i  fVT 
Vk+p  nk+p  "v  Jk+p 


0+1  L,t 


'&  +  p’ 


QiirB 


U) 

k+p 


+  F, 


0) 


k+p  ’ 


respectively.  We  comment  that  at  this  point  of  each  algorithm,  a  polynomial  A) 
has  been  applied.  Algorithm  8.2  applies  the  polynomial  matrix  at  the  begin¬ 

ning  of  its  iteration  to  the  columns  of  while  Algorithm  4.2  applies  the  polynomial 
during  the  implicit  application  of  the  shifts. 


117 


Note  that  +++ AV+"  =  xfrpl>  and  (Q(+)T  AQ(+  =  B.  Both  matrices 
represent  the  orthogonal  projections  of  A  onto  two,  in  general,  different  column  spaces. 
Suppose  the  same  shifts  are  applied  during  each  iteration  of  Algorithms  8.2  and  4.2: 
^W(A)  =  (A)  for  i  =  l,...,jf  where  the  polynomials  of  degree  p  were 

defined  in  the  development  leading  up  to  equation  (4.2.7)  of  §  4.2  in  Chapter  4.  The 
column  spaces  are: 

nv+A  =  Kt+P(ri,»!i+1>) 

=  Kt+p(A,VJr(A)v\'\ 

m'il)  = 

=  n-pir(A)uiyp). 

Moreover,  suppose  that  =  c.j  and  recall  that  represents  the  first  k+p  columns 
of  the  identity  matrix.  Algorithm  8.2  computes  the  leading  k  +  p  columns  of  the  QR 
factorization  of  Vjp(A).  On  the  other  hand,  if  we  assume  that  the  grade  of  ej  is  at 
least  k  +  p,  Algorithm  4.2  computes  the  leading  k  +  p  columns  of  the  Krylov  matrix 
Kk+p(A,Vjp(A)e  i). 

As  explained  in  §  4.2  of  Chapter  3,  the  last  p  columns  of  the  above  Arnoldi 
factorization  are  discarded  because  of  the  fill-in  suffered  by  eJ+pZ^pK  Extending  the 
ensuing  length  k  Arnoldi  factorization  to  length  k+p  allows,  in  general,  a  different  set 
of  Arnoldi  vectors  to  be  appended  to  the  last  p  columns  of  during  each  cycle 

of  iteration.  This  is  one  of  the  major  differences  between  Algorithms  4.2  and  8.2. 
Algorithm  8.2  applies  a  polynomial  in  A  to  the  same  initial  subspace  determined  by 
the  K(Uj»p). 

Parlett.  [62]  presents  an  excellent  survey  comparing  the  Lanczos  and  Subspace  it¬ 
erations  for  the  symmetric  eigenvalue  problems  arising  in  structural  mechanics.  The 
conclusion  reached  is  that  the  Lanczos  iteration  is  almost  always  a  superior  algorithm. 
The  literature  is  sparse  for  similar  comparisons  between  Arnoldi ’s  and  Subspace  it¬ 
eration  for  nonsymmetric,  eigenvalue  problems.  As  Chatelin  [18,  page  281]  notes,  the 
choice  between  the  two  nonsymmetric  algorithms  is  not  so  clear. 

8.3.1  Adaptive  Procedures  used  within  an  IRA-iteration 

As  explained  in  §  3.2  ol  Chapter  3,  the  QR-iteration  is  a  nested  sequence  of  subspace 
iterations.  Since  the  IRA-iteration  is  just  the  leading  portion  of  the  QR-iteration, 
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this  section  gives  an  indication  of  how  to  determine  the  value  of  p  needed  by  an 
IRA-iteration  by  considering  the  formal  connections  with  subspace  iteration. 

The  application  of  shifts  during  an  IRA-iteration  is  analogous  to  performing  a 
SRR  step.  Lines  2. 3-2. 4  of  Algorithm  4.2  effectively  apply  the  SRR  step:  The  first  k 
columns  of  Z ^  span  the  wanted  invariant  subspace  of  and  the  resulting  updated 
Arnoldi  factorization 


AV&ZM 


of  Line  2.4  represents  the  application  of  the  SRR  projection.  As  equation  (8.2.2) 
indicates,  the  optimal  choice  of  p  is  a  complicated  decision.  On  the  one  hand,  the 
number  of  shifts  applied  should  be  sufficient  so  that  annihilates  the  unwanted 

components  of  up.  On  the  other  hand,  since  application  of  the  shifts  is  equivalent  to 
a  SRR  step,  the  discussion  following  Theorem  8.3  indicates  that  too  large  of  a  value 
of  p  may  slow  down  convergence.  Extensive  numerical  experiments  dictate  that  the 
value  of  p  should  be  slightly  decreased  during  each  iteration. 


8.4  Implicitly  Shifted  Orthogonal  Iteration 

The  main  expense  of  Algorithm  8.1  is  the  formation  of  matrix  vector  products  with 
A  at  lines  2.1  and  2.3.  The  application  of  the  polynomial  in  A  may  instead  be  applied 
implicitly  through  B  and  hence  the  cost  of  Algorithm  8.1  may  be  reduced.  We  first 
establish  the  following  result. 

Lemma  8.1  Let  A  €  R”Xn,  B  €  Rkxk  and  U  €  R',xk  with  UTU  =  Ik. 

Let  AU  -  UB  +  F  where  F  =  AU  -  UB.  If  V(\)  =  (A  -  n)  •  •  •  (A  -  rm) 
then 

(8.4.1)  V{A)U  =  UV(B)  +  F, 

where  F  =  V(A)U  -  UV{B). 

Proof  The  proof  is  by  induction.  Consider  applying  the  polynomial  of  degree  one  ; 

AU  =  UB  +  F, 

AU  -tjU  =  UB-nU  +  F, 

(A-nQU  =  U(B  -  TlIk)  +  F, 
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and  the  base  case  is  established.  Suppose  that  equation  (8.4.1)  holds  lor  all  monic 
polynomials  of  degree  less  than  or  equal  to  m  —  1.  Defining  F  =  AU  —  UB  it  follows 
that 


{A  -  TmIn)V(A)U  =  (A-  TmIn)UV(B)  +  (A-  rmJn)F, 

=  U(B  -  rmIk)V{B)  +  FV(B)  +  (A  -  rm/„)F. 

Finally,  the  result  on  the  residual  follows  since 

FV(B)  +  (A-rmIn)F  =  (AU-UB)V(B)  +  {A-TmIn)(V(A)U-UV(B)), 

=  —UBV(B)  +  (A-  rmIn)V(A)U  +  rmUV(B), 

=  (A-  T7nIn)V(A)U  -  U(B  -  TmIk)V(B). 


□ 

Although  m  matrix  vector  products  during  each  cycle  of  the  iteration  with  A 
may  be  avoided,  the  error  in  using  V(B)  is  F  =  V(A)U  —  UV(B).  As  the  range  of  U 
improves  as  an  approximation  to  an  invariant  subspace  of  A ,  the  error  F  is  accordingly 
reduced.  If  AV  =  VT,  where  T  is  upper  triangular,  then  a  simple  calculation  shows 
that  V(A)V  =  VV(T)  and  hence  the  residual 

V{A)V  -  VV(B)  =  V(V{T)  -  V{B))  =  0, 

since  V(B)  =  VTV(A)V. 

Computing  the  orthogonal  factorization  V(B)  =  QB  we  obtain, 

(8.4.2)  V(A)U  =  UQR  +  F. 

Post-multiplying  equation  (8.4.2)  by  Q  results  in 

(8.4.3)  V{A){UQ)  =  ( UQ)CfV(B)Q  +  FQ , 

since  RQ  =  QTV(B)Q.  Thus,  m  QR  steps  are  performed  with  the  set  of  shifts 
{t«}£Ll*  Note  that  post-multiplication  with  the  orthogonal  Q  in  equation  (8.4.3)  does 
not  change  the  size  of  the  error  F.  Lines  2.1 — 2.3  of  Algorithm  8.1  may  be  replaced 
to  obtain  the  following  procedure: 

Algorithm  8.4 

2.1  Compute  the  QR  factorization  :  Qk+pR*+P  where 


120 


P«(A)  =  ( A  -  T«>)  -  -  -  (A  -  T-W))  ; 

2.2  AUi^’l  . 

An  interesting  observation  is  comparing  the  application  of  shifts  in  Algorithm  4.2 
with  the  above  implicit  application  of  shifts.  Algorithm  4.2  discards  the  last  p  columns 
due  to  the  fill-in  that  occurs,  in  contrast  to  the  above  implicit  application  of  shifts. 
(  See  Figures  4.1 —  4.3  of  Chapter  4  for  an  illustration  of  the  fill-in.  ) 

The  convergence  properties  and  numerical  behavior  of  the  above  implicitly  shitted 
orthogonal  iteration  requires  further  investigation.  For  example,  what  is  the  conver¬ 
gence  of  rate  of  Algorithm  8.4  when  the  interest  is  in  A’s  dominant  invariant  subspace, 
i.e.  using  zero  shifts  ?  If  the  convergence  rate  is  competitive  with  Algorithm  8.2,  then 
a  significant  savings  in  computational  effort  may  be  realized  bv  avoiding  m  matrix- 
vector  products  during  each  iteration  cycle. 
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Chapter  9 


Thesis  Summary  and  Future  work 


This  dissertation  has  examined  Sorensen’s  implicitly  re-started  Arnold!  iteration. 
After  an  introduction  to  the  goals  and  subject  of  the  thesis  in  Chapter  one,  the 
second  and  third  chapters  established  the  connection  that  an  IRA-iteration  is  mathe¬ 
matically  equivalent  to  building  only  the  leading  portion  of  a  QR-iteration  of  a  matrix. 
The  practical  QR  algorithm  was  considered  in  some  detail  since  the  major  goal  of  this 
thesis  is  to  present  numerical  techniques  that  result  in  a  robust  implementation  of  an 
IRA-iteration.  Chapter  4  both  investigated  and  surveyed  the  various  ways  in  which  to 
re-start  an  Arnoldi  factorization.  It  was  shown  that  the  IRA-iteration  uses  the  same 
mechanism  as  the  implicitly  shifted  QR  algorithm  and  thus  enjoys  its  many  stability 
properties.  Chapter  5  examined  the  possible  loss  of  forward  stability  that  an  IRA- 
iteration  undergoes  and  considered  its  impact  upon  the  Ritz  values.  A  fundamental 
connection  between  the  algorithms  used  to  re-order  a  Schur  decomposition  and  an 
IRA-iteration  was  also  made.  The  forward  instability  of  QR.  algorithm  was  shown 
to  be  responsible  for  the  occasional  failure  of  the  implicit  re-starting  technique.  A 
sensitivity  analysis  was  also  presented  for  the  orthogonal  reduction  of  a  matrix  to 
upper  Hessenberg  form.  Thus,  the  forward  instability  of  an  IRA-iteration  was  seen  to 
have  a  geometric  interpretation:  Small  components  of  the  starting  vector  that  are  in 
unwanted  invariant  subspaces  are  possibly  amplified  during  the  iteration. 

Deflation  techniques  for  an  IR  A-iteration  were  the  subject  of  Chapter  6.  The  first 
technique,  Locking,  allows  an  orthogonal  change  of  basis  for  an  Arnoldi  factorization 
which  results  in  a  partial  Schur  decomposition  containing  the  converged  Ritz  values. 
The  corresponding  Ritz  value  is  deflated  in  an  implicit  manner.  The  second  technique, 
Purging,  allows  implicit  removal  of  unwanted  converged  Ritz  values  from  the  Arnoldi 
iteration.  Both  deflation  techniques  are  accomplished  by  working  with  matrices  in 
the  projected  Krylov  space  which  for  large  eigenvalue  problems  is  a  fraction  of  the 
order  of  the  matrix  from  which  estimates  are  sought.  Since  both  deflation  techniques 
are  implicitly  applied  to  the  Arnoldi  factorization,  the  need  for  explicit  re-starting 
associated  with  all  other  deflation  strategies  is  avoided.  Both  techniques  were  care- 
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fully  examined  with  respect  to  numerical  stability  and  computational  results  were 
presented.  Convergence  of  the  Arnoldi  iteration  is  improved  and  a  reduction  in  com¬ 
putational  effort  is  realized.  The  numerical  examples  demonstrate  how  the  deflation 
techniques  remove  the  requirement  for  a  block  Arnoldi/Lanczos  method  to  compute 
approximations  to  multiple  or  clustered  eigenvalues. 

The  final  two  chapters  surveyed  and  presented  formal  analysis  for  the  practical 
issues  associated  with  maintaining  orthogonality  of  the  Arnoldi  vectors  and  choosing 
/>,  the  number  of  shifts  to  apply.  In  addition,  two  simultaneous  iteration  algorithms 
were  introduced  that  require  further  investigation. 

9.1  Future  Work 

There  remain  several  areas  that  require  further  research.  The  future  goal  is  to  better 
understand  all  of  the  practical  issues  that  will  lead  to  optimal  convergence  of  an 
IIlA-iteration. 

1.  Robust  stopping  criteria;  especially  for  nonsymmetric  eigenvalue  problems.  The 
discussion  of  §  2.5  in  Chapter  2  gave  an  indication  of  the  importance  of  the  better 
understanding  needed,  especially  the  impact  of  the  non-normality  of  A. 

2.  Practical  convergence  aspects/theory.  Although  Chapter  8  established  a  con¬ 
nection  between  an  IRA-iteration  and  shifted  orthogonal  iteration,  more  work  is 
required  in  order  to  determine  near  optimal  adaptive  selection  of  p  relative  to 
k. 

3.  Reliability  of  an  IRA-iteration.  When  successful,  Algorithm  4.2  computes  an 
approximate  invariant  subspace  of  A  of  dimension  k.  However,  there  is  no 
guarantee  that  this  is  the  wanted  invariant  subspace.  For  example,  suppose  the 
wanted  invariant  subspace  has  an  eigenvalue  of  multiplicity  greater  than  one. 
Does  an  IRA-iteration  correctly  resolve  this  multiplicity  ?  We  remark  that  all 
numerical  methods  for  computing  a  few  eigenvalues  for  a  nonsymmetric  matrix 
A  face  this  dilemma. 

4.  Further  investigation  is  needed  to  establish  a  direct  connection  between  the 
forward  instability  of  an  IRA-iteration  and  the  sensitivity  of  reducing  a  matrix 
to  upper  Hessenberg  form  via  orthogonal  transformations.  Theorem  5.3  gives  a 


123 


geometrical  interpretation  of  forward  instability  but  a  link  with  the  Parlett  and 
Le  [63]  condition  would  be  interesting. 

5.  The  generalized  eigenvalue  problem  Ax  —  Bx A.  This  dissertation  concentrated 
on  the  case  where  B  -  I.  When  B  is  not  the  identity  matrix,  either  A ,  B,  or  a 
linear  combination  of  the  two  must  be  factored.  For  symmetric  A,  the  work  of 
Ericsson  and  Ruhe  [30]  considers  the  spectral  transformation  Lanczos  method 
which  was  further  extended  by  Nour-Omid,  Parlett,  Ericsson  and  Jensen  [56]. 
The  ARPACK  [49]  software  implements  the  techniques  described  in  the  latter 
study.  Saad  [78]  discusses  the  many  difficulties  that  arise  lor  the  nonsymmetric 
generalized  eigenvalue  problem.  The  recent  work  of  Meerbergen  and  Spence  [52] 
discusses  the  special  but  important  case  of  A  nonsymmetric  and  B  symmetric 
positive  semi-definite 

6.  Preconditioning  techniques  for  an  IRA-iteration.  The  analysis  and  techniques 
presented  in  this  dissertation  also  serve  to  establish  the  viability  of  computing 
approximations  to  selected  portions  of  A’s  spectrum  using  a  preconditioner  that 
only  needs  matrix  vector  products.  The  motivation  for  using  preconditioning  for 
eigenvalue  problems  is  to  allow  faster  and  more  robust  convergence  to  selected 
portions  of  A’s  spectrum  that  are  of  interest.  It  is  often  observed  that  the 
wanted  eigenvalues  are  not  those  that  the  Arnoldi  iteration  naturally  converges 
towards.  We  first  clarify  the  concept  of  preconditioning  lor  eigenvalue  problems. 
A  preconditioner  T  is  a.  transformation  on  A  that  results  in  the  matrix  JF(A).  A 
good  preconditioner  results  if  the  Arnoldi/Lanczos  iterations  on  ^F(A)  converge 
most  rapidly  towards  the  wanted  eigenvalues  of  A  under  the  transformation. 

Among  the  most  powerful  preconditioners  employed  are  those  factoring  and 
solving  linear  systems  with  A.  An  important  example  is  the  shift  and  invert  or 
spectral  transformation  defined  by  B{\)  =  (A  —  <r)_1.  The  transformation  has 
the  affect  of  transforming  the  eigenvalues  of  A  closet  to  a  into  large  and  well 
separated  ones  for  B(A).  The  eigenvectors  of  T(A)  are  the  same  as  those  of  A 
and  the  eigenvalues  are  related  through  the  transformation.  Saad  [78]  discusses 
shift  and  invert  Arnoldi  method  for  nonsymmetric  eigenvalue  problems.  Ruhe 
introduces  and  examines  the  use  of  rational  preconditioners  in  the  series  of 
papers  [69,  70,  71].  The  work  of  Meerbergen  and  Roose  [51]  presents  an  excellent 
overview  of  preconditioning  for  the  nonsymmetric  eigenvalue  problem. 
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The  primary  drawback  in  using  rational  preconditioning  is  that  linear  systems 
involving  A  require  solution.  This  may  prove  quite  inefficient  and  prohibitive  in 
many  eigenproblems.  Although  the  order  of  A  is  often  the  culprit,  moderately 
sized  eigenvalue  problems  may  involve  dense  matrices  that  are  expensive  both 
to  store  and  factor.  This  thesis  demonstrates  that  it  is  often  possible  to  con¬ 
verge  to  the  extremal  portions  of  the  spectrum  of  A  using  only  matrix  vector 
products  or  employing  a  polynomial  preconditioner.  In  these  situations,  the 
expense  of  factoring  and  solving  linear  systems  with  A  is  avoided.  The  decision 
in  whether  to  use  only  polynomial  preconditioning  involves  a  tradeoff  between 
the  number  of  matrix  vector  products  versus  the  number  of  matrix  factoriza¬ 
tions  and  linear  systems  solutions  that  are  required,  respectively,  for  solution  of 
the  eigen-problem.  Further  work  is  required  in  better  understanding  all  these 
issues  as  well  as  the  impact  of  other  shifting  strategies  besides  the  exact  one 
considered  in  this  thesis.  In  particular,  the  use  of  an  IR  A -iteration  for  computing 
approximations  to  the  interior  eigenvalues  of  A  needs  to  be  carefully  examined. 

7.  An  evaluation  of  software  for  solving  large  sparse  nonsymmetric,  eigenvalue  prob¬ 
lems.  The  last  few  years  has  seen  a  vigorous  research  effort  in  numerical  methods 
for  large  scale  nonsymmetric  eigenvalue  problems.  This  effort  is  starting  to  be 
realized  in  high  quality  software.  However,  a  review  and  survey  of  the  current 
software  and  the  algorithms  implemented  is  needed.  The  motivation  for  under¬ 
taking  this  study  is  to  begin  the  critical  review  necessary  to  compare  and  test 
the  underlying  algorithms  used  in  the  various  software  approaches  and  to  better 
understand  where  improvements  are  needed.  The  software  approaches  needing 
review  include: 

•  The  block  nonsymmetric  Lanczos  algorithm  [6], 

•  The  block  Arnoldi  algorithm  [80]. 

•  The  rational  Krylov  algorithm  of  Rnlie  [69,  70,  71]. 

•  The  ARPACK  software  package  [49]. 

•  The  simultaneous  iteration  algorithm  of  Stewart  and  Jennings  [91,  92]. 

•  The  two  subspace  iteration  codes  EA12  of  Duff  and  Scott  [28],  and  SRRIT 
of  Bai  and  Stewart  [10]. 

Other  important  issue  include  comparing  Algorithms  4.2  and  4.7  of  Chapter  4. 
Finally,  a  study  comparing  the  performance  of  the  codes  in  terms  of  storage 
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requirements,  execution  times,  and  accuracy,  and  considering  their  suitability 
for  solving  large-scale  industrial  problems  is  underway  [48]. 
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